The Historical Development 
of the Calculus 


C. H. Edwards, Jr. 


The Historical Development 
of the Calculus 


With 150 Illustrations 


Springer-Verlag 
New York Berlin Heidelberg London Paris 
Tokyo Hong Kong Barcelona Budapest 


C. H. Edwards, Jr. 
Department of Mathematics 
University of Georgia 
Athens, GA 30602 

USA 


AMS Subject Classification (1991): 26-01, 26-03, 01A45, 01A50, 26A06 


Library of Congress Cataloging-in-Publication Data 
Edwards, Charles Henry, 1937- 
The historical development of the calculus. 
Bibliography: p. 
Includes index. 
1. Calculus—history. I. Title. 
QA303.E224 515’ .09 79-15461 


Printed on acid-free paper. 


© 1979 by Springer-Verlag New York, Inc. 
Softcover reprint of the hardcover Ist Edition 1979 


All rights reserved. This work may not be translated or copied in whole or in part without the 
written permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New 
York, NY 10010, USA), except for brief excerpts in connection with reviews or scholarly 
analysis. Use in connection with any form of information storage and retrieval, electronic 
adaptation, computer software, or by similar or dissimilar methodology now known or hereaf- 
ter developed is forbidden. 

The use of general descriptive names, trade names, trademarks, etc., in this publication, even 
if the former are not especially identified, is not to be taken as a sign that such names, as 
understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely 
by anyone. 


Production managed by Laura Carlson; manufacturing supervised by Vincent Scelta. 
Printed and bound by R.R. Donnelley & Sons, Crawfordsville, IN. 


987654321 


ISBN-13: 978-0-387-943 13-8 e-ISBN-13: 978-1-4612-6230-5 
DOI: 10.1007/978-1-4612-6230-5 


Preface 


The calculus has served for three centuries as the principal quantitative 
language of Western science. In the course of its genesis and evolution 
some of the most fundamental problems of mathematics were first con- 
fronted and, through the persistent labors of successive generations, finally 
resolved. Therefore, the historical development of the calculus holds a 
special interest for anyone who appreciates the value of a _ historical 
perspective in teaching, learning, and enjoying mathematics and its ap- 
plications. My goal in writing this book was to present an account of this 
development that is accessible, not solely to students of the history of 
mathematics, but to the wider mathematical community for which my 
exposition is more specifically intended, including those who study, teach, 
and use calculus. 

The scope of this account can be delineated partly by comparison with 
previous works in the same general area. M. E. Baron’s The Origins of the 
Infinitesimal Calculus (1969) provides an informative and reliable treat- 
ment of the precalculus period up to, but not including (in any detail), the 
time of Newton and Leibniz, just when the interest and pace of the story 
begin to quicken and intensify. C. B. Boyer’s well-known book (1949, 1959 
reprint) met well the goals its author set for it, but it was more ap- 
propriately titled in its original edition—The Concepts of the Calculus— 
than in its reprinting. Boyer gives an excellent account of the historical 
development of the concepts that lie at the foundations of the calculus (as 
opposed to the evolution of the calculus itself as a computational disci- 
pline); his essentially verbal exposition was well adapted to his emphasis 
on what used to be called the “metaphysics” of the calculus. 
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However, the calculus as a distinct mathematical discipline is not solely 
an abstract body of fundamental concepts. It is, above all, the calculating 
instrument par excellence; its greatest successes have always been answers 
to the question “How does one actually compute it?” The solution of 
specific problems has, in the development of the calculus, often played a 
role not unlike that of the experimentum crucis in natural science. Typi- 
cally, a particular problem solution yields, by inference, a general tech- 
nique or procedure which confronts first the question of what new prob- 
lems it can solve, and in turn raises conceptual questions regarding the 
range of its applicability, the answers to which may finally illuminate the 
original problem. In this book I have tried to mirror this complex historical 
process by anchoring my account of the development of fundamental 
concepts and general methods in the computational paradigms that I feel 
have played the central role in the development of the calculus. 

Our connected narrative makes its way (if sometimes unevenly, as 
dictated by the unsteady course of history) from the measurement of land 
area in antiquity to the nonstandard analysis of the twentieth century. The 
first two chapters detail those aspects of Greek mathematics that provided 
the foundation for the development of the calculus. Chapter 3 outlines the 
absorption and transmission of the Greek legacy by the Arab hegemony, 
the medieval scholastic speculations that contributed to an environment 
sympathetic to infinitesimal investigations, and the eventual renaissance of 
mathematical progress in Western Europe. Chapters 4—7 deal with several 
ingredients (logarithmic computations, infinitesimal area and tangent 
methods, and infinite series techniques) of the fertile amalgam that fueled 
the mathematical explosion of the later seventeenth century. 

The centerpiece of any history of the calculus will inevitably be its 
treatment of the contributions of Newton and Leibniz. In Chapter 8 I have 
mined the riches of D. T. Whiteside’s monumental edition of Newton’s 
mathematical papers to outline (as would not previously have been possi- 
ble) his calculus researches over a quarter-century period, beginning with 
the plague years 1665-66 that were “the prime of his age for invention.” 
Leibniz seems to me to have been ill served by most English-language 
accounts of his mathematical work—-I hope that Chapter 9 will help to 
promote a wider and more accurate understanding of the origin and 
distinct motivation of his approach to the calculus. 

Chapter 10 deals with the onrushing technical progress of the eighteenth 
century, exemplified by Euler’s work, and with controversies about the 
meaning of it all. Chapter 11 gives the nineteenth century’s answers to the 
questions of the preceding two centuries. Chapter 12 discusses two 
twentieth-century developments that serve to round out our story. 

A principal feature of this book is the inclusion of exercises interspersed 
throughout the text as an integral part of the exposition. The history of 
mathematics, like mathematics itself, is best learned not by passive read- 
ing, but with pen in hand. Moreover, the solution of problems typical of a 
particular historical period, using the tools of that time, enables the reader 
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to share (if only at a distance) in the excitement of first discovery. And 
what better way to penetrate the thought of Archimedes and Newton than 
to work some of their problems using their own methods? Even so, I 
should point out that the exercises are annotated in a way that permits 
them simply to be read (rather than be worked) like footnotes or further 
remarks. Thus I have adopted the systematic use of annotated exercises as 
a convenient device for the inclusion of additional insights and supplemen- 
tary material without a surplus of technical detail. 

I hope that this book encourages further study by raising more questions 
than it answers. Each chapter is provided with its own bibliography, to 
which references for additional reading are made (within square brackets) 
throughout the chapter. Indeed, a book on the history of mathematics can 
serve no finer purpose than to guide its readers to the original sources that 
are the shrines of our subject. 

Although the study of the history of mathematics has an intrinsic appeal 
of its own, its chief raison d’etre is surely the illumination of mathematics 
itself. For example, the gradual unfolding of the integral concept—from 
the volume computations of Archimedes to the intuitive integrals of 
Newton and Leibniz and finally the definitions of Cauchy, Riemann, and 
Lebesgue—cannot fail to promote a more mature appreciation of modern 
theories of integration. Because of the wide range of elementary mathe- 
matical topics that have contributed to the development of the calculus, I 
have found the sequence of topics covered in this book suitable for an 
introductory history of mathematics course. Moreover, in the context of a 
continuous exposition, I have included throughout the book examples and 
units of material that should be convenient for inclusion in a wide range of 
courses—introductory and advanced calculus, the general history of 
mathematics, and certain precalculus courses. 

I owe a special debt to my wife Alice for actively sharing my enthusiasm 
and interest in this project. In addition to her reading and occasional 
criticism of the manuscript, her constant support and encouragement 
entitles her to a full share of satisfaction at its completion. 


C. H. Edwards, Jr. 


Preface to Second Printing 
(1982) 


In this printing I have corrected a number of misprints and made several 
small changes and improvements. Among the readers who made helpful 
suggestions, I especially would like to thank Professor Sherman K. Stein for 
his advice and assistance. 
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Area, Number, and 
Limit Concepts in Antiquity 


Babylonian and Egyptian Geometry 


The historical origins of what we now call mathematical concepts—those 
that deal with number, magnitude, and form—can be traced to the rise of 
civilizations in the fertile river valleys of China, Egypt, India, and 
Mesopotamia. In particular, fairly detailed and reliable information 1s now 
available concerning the highly organized cultures of the peoples who lived 
along the Nile in Egypt and in the “fertile crescent” of the Tigris and 
Euphrates rivers in Mesopotamia in the early centuries of the second 
millenium B.C. 

The Greeks, whose geometrical investigations provided the foundations 
for the development of much of modern mathematics (including the 
calculus), generally assumed that geometry had its origin in Egypt. For 
example, the Greek historian Herodotus (fifth century B.C.) wrote that 
agricultural plots along the Nile were taxed according to area, so that when 
the annual flooding of the river swept away part of a plot and its owner 
applied for a corresponding reduction in his taxes, it was necessary for 
surveyors to determine how much land had been lost. Obviously, this 
would have required the invention of elementary techniques of geometrical 
measurement. 

More direct information is provided by the Egyptian papyri that have 
been rediscovered in modern times. In regard to Egyptian mathematics, 
the most important of these is the Rhind Papyrus which was copied in 
about 1650 B.c. by a scribe named Ahmes who states that it derives from a 
prototype from the “middle kingdom” of about 2000 to 1800 B.c. This 
papyrus consists mainly of a list of problems and their solutions, about 
twenty of which relate to the areas of fields and volumes of granaries. Each 
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Figure 1 


problem is stated in terms of particular numbers (rather than literal 
variables), and its solution carried out in recipe fashion, without explicitly 
specifying either the general formula (if any) used or the source or 
derivation of the method. 

Apparently it is taken for granted that the area of a rectangle is the 
product of its base and height. The area of a triangle is calculated by 
multiplying half of its base times its height. In one problem the area of an 
isosceles trapezoid, with bases 4 and 6 and height 20, is calculated by 
taking half the sum of the bases, “so as to make a rectangle,” and 
multiplying this times the height to obtain the correct area of 100. This and 
similar examples suggest that Egyptian prescriptions for area computations 
may have stemmed from elementary dissection methods involving the idea 
of cutting a rectilinear figure into triangles and then rearranging the parts 
so as to obtain a rectangle. 


EXERCISE 1. Use the dissections suggested by Figure 1 to derive the familiar 
formulas for the areas of triangles (;bh), parallelograms (bh), and trapezoids 


(5(, + b,)A). 


EXERCISE 2. A later papyrus calculates the area of a quadrilateral (4-sided polygon) 
by multiplying half the sum of two opposite sides times half the sum of the other 
two sides. Does this give the correct result for a trapezoid or parallelogram that is 
not a rectangle? 


EXERCISE 3. (a) In one of the Rhind papyrus problems the area of a circle is 
calculated by squaring 8/9 of its diameter. Compare this method with the area 
formula A = zr? to obtain the Egyptian approximation 7=33.16. 

(b) This very good approximation to 7 may have been obtained as follows. 
Trisect each side of the square circumscribed about a circle of diameter d, and cut 
off its 4 corners as indicated in Figure 2. Show that the area of the resulting 
octagon is 
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During the' past half-century a great many mathematical cuneiform 
tablets, dating from the Old Babylonian age of the Hammurabi dynasty 
(ca. 1800-1600 B.c.), have been unearthed and deciphered. It now appears 
that Babylonian mathematics was considerably more advanced than Egyp- 
tian mathematics. For example, the Babylonians were adept at the solution 
of algebraic problems involving quadratic equations or pairs of equations, 
either two linear equations in two unknowns or one linear and one 
quadratic equation. They computed accurate numerical answers using a 
positional sexagesimal (base 60) system of numeration. For example, they 
calculated V2 as 


l+ tet ay = 1414213, 


which differs by less than 0.000001 from the true value. 

In regard to geometry, the Babylonians correctly calculated the areas of 
triangles and trapezoids, and the volumes of cylinders and prisms (as the 
area of the base times the height). They were also familiar (at least on an 
empirical basis), well over a millenium before the time of Pythagoras, with 
the so-called Pythagorean theorem to the effect that the sum of the squares 
of the legs of a right triangle is equal to the square of its hypotenuse. In a 
typical Babylonian problem to be solved using this result, a ladder of given 
length would be standing against a wall, and it would be asked how far the 
bottom of the ladder slides away from the wall, if its top is lowered by a 
given distance. 

Just as in the case of the Egyptian papyri, the Babylonian tablets mainly 
present problems solved by means of prescriptions that do not provide the 
basis for their methods. However, the following exercise presents a deriva- 
tion of the Pythagorean theorem that would have been well within their 
range, because they were familiar with the formula (a + b)? = a?+2ab+ 
b*. 


EXERCISE 4. Four copies of a right triangle with legs a and b and hypotenuse c, 
together with a square of edge c, are assembled as in Figure 3 to form a square of 
edge a+b. Explain why the assembled figure is a square, and derive the 
Pythagorean relation by computing its area in two different ways. 
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Figure 3 


Figure 4 


EXERCISE 5. The Babylonians generally used 377 for the area of a circle of radius r, 
corresponding to the poor approximation 723. Show that this approximation 
could have been obtained by averaging the areas of the inscribed and circum- 
scribed squares in Figure 4. 


EXERCISE 6. The Babylonians generally calculated the volume of a frustum of a 
cone or pyramid by means of the plausible (?) formula V = $(A, + A,)h, where h is 
its height and A,, A, the areas of its top and bottom. Show that this formula is 
incorrect by calculating the volume of a frustum of height 2, cut from a cone of 
height 4 and base radius 2 (Fig. 5). Use the (correct) formula V= lar*h for the 
volume of a cone. 


In summary, the Egyptians and especially the Babylonians acquired a 


significant accumulation of elementary geometrical facts that they used to 
solve particular numerical problems. However, their surviving texts include 
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few if any explicit statements of general rules or methods of procedure. 
They made no clearcut distinctions between exact and approximate results. 
There is no indication of any emphasis on logical proofs or derivations in 
Egyptian and Babylonian thought. Thus their mathematics, despite its 
notable accomplishments, seems not to have been organized into any 
deductive system of investigation. 

More complete accounts of Egyptian and Babylonian mathematics may 
be found in the books by Boyer [1], Neugebauer [8], and van der Waerden 
[12] cited in the references at the end of this chapter. Detailed discussions 
of ancient approximations and computations of the number a may be 
found in the articles by Seidenberg [9] and Smeur [10]. 
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The Babylonian and Egyptian lore of number and geometry was assimi- 
lated by the Greeks, who contributed to mathematics the consciously 
logical and explicitly deductive approach that is now its distinguishing 
feature. The history of Greek mathematics begins in the sixth century B.c. 
with Thales and Pythagoras, both of whom are said to have traveled to 
Babylonia and Egypt to acquire the knowledge of those lands. 

Thales lived in the first half of the sixth century B.c. On the basis of a 
late fourth century B.c. history of Greek mathematics that is now lost, the 
fourth century A.D. philosopher Proclus (in his commentary on the first 
book of Euclid’s Elements) states that Thales proved the following theo- 
rems: 


1. A diameter of a circle divides it into two equal parts. 

2. The base angles of an isosceles triangle are equal. 

3. The vertical angles formed by two intersecting straight lines are equal. 
4. The angle-side-angle congruence theorem for triangles. 


In addition, the fact that an angle inscribed in a semi-circle is a right angle 
is still known as the “theorem of Thales.” Whether or not the tools for 
actual proofs of such theorems existed as early as Thales, it is significant 
that he is the first human being to whom proofs of specific mathematical 
results have even been attributed. Proclus adds (as quoted by van der 
Waerden [12], p. 90) that Thales 


made many discoveries himself, in many other things he showed his 
successors the road to the principles. Sometimes he treated questions in a 
more general manner, sometimes in a more intuitive way ... Pythagoras, 
who came after him, transformed this science into a free form of educa- 
tion; he examined this discipline from its first principles and he en- 
deavored to study the propositions, without concrete representation, by 
purely logical thinking. 
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Pythagoras is thought to have died about 500 B.c. He established a 
secret society or cult with distinctly mystical aspects that continued after 
his death. However, the Pythagoreans were actively engaged in the pursuit 
of learning, including mathematics (which orginally meant “that which is 
learned’’). At their hands the subject gradually assumed an abstract char- 
acter that distinguished it from the empirical and pragmatic mathematics 
of the Babylonians and Egyptians. Before the end of the fifth century B.c. 
they had formulated and proved on a rational basis the common theorems 
dealing with relations between triangles and other rectilinear plane figures 
and their areas. 


“All is number” is quoted as the motto of the Pythagorean school. The 
Greeks used the word number to mean a “whole” number, a positive 
integer. In Greek theoretical mathematics (as distinguished from practical 
or commercial arithmetic) a fraction that we would write as a/b was not 
regarded as a number, as a single entity, but as a relationship or ratio a : b 
between the (whole) numbers a and b. Thus the ratio a : b was, in modern 
terms, simply an ordered pair, rather than a rational number. 

Two ratios were said to be proportional, a : b= c : d, if (with the obvious 
meaning) a is the same part or parts or multiple of b as c 1s of d. For 
example, 6: 9=10: 15 because 6 is two of the three parts of 9, as 10 is 
two of the three parts of 15. More formally, a: b=c:d provided that 
there exist integers p, g, m, n such that a= mp, b= mq, c= np, d=ngq (so 
a/b and c/d are both integral multiples of p/q). On this basis the early 
Pythagoreans developed an elementary theory of proportionality. 


EXERCISE 7. Establish the following implications. 


(ij) a: b=c:d => a:c=b:d => ad=be 


Gi) a:c=b:c => a=b 
(iii) a:b=c:d => (at+b): b=(ct+d):d 
(iv) a: b=c:d = (a-—b): b=(c—d):difa>b. 


This discrete view of number or size was also applied to geometrical 
magnitudes—lengths, areas, and volumes. In particular, it was believed by 
the early Pythagoreans that any two line segments are commensurable, that 
is, are multiples of a common unit. On this assumption, the theory of 
integer ratios and proportions readily extends so as to apply to lengths and 
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areas of simple figures such as line segments and rectangles. For example, 
the ratio a : b of the lengths of the two line segments in Figure 6 is equal 
to the ratio 2: 3 of integers, while the ratio A : B of areas of the two 
rectangles is equal to 4: 6. Thus we can talk about proportions a: b= 
A : B=2:3 between ratios of magnitudes of different types—numbers, 
lengths, and areas. 

For simple geometric figures with commensurable dimensions, the usual 
results involving area relationships are then easily established. For exam- 
ple, given two rectangles R and S with commensurable bases a and b and 
equal height h, the ratio A : B of their areas is equal to the ratio a: b of 
their bases. For if a= mc and b=nc where m and n are integers, then R 
consists of m subrectangles with base c and height h, while S consists of n 
such subrectangles. Hence A: B=m:n=a: b. 


EXERCISE 8. Suppose that two rectangles are similar, meaning that the ratio of their 
bases is proportional to the ratio of their heights. If their bases and heights are 
commensurable, prove that the ratio of their areas is proportional to the ratio of 
the squares of (or on) their bases. By taking halves, the same result obtains for 
similar triangles (why?). 


EXERCISE 9. A regular polygon is one with equal sides and equal angles. Define 
similarity for regular polygons. Then prove that the ratio of the areas of two similar 
regular polygons (with commensurable sides) is proportional to the ratio of the 
squares of their respective sides. Hint: By joining its vertices to its center, any 
regular polygon can be dissected into congruent isosceles triangles. 


According to a fragment from the lost history of Eudemus that was 
allegedly copied verbatim in the sixth century A.D. by the Aristotelian 
commentator Simplicius, Hippocrates of Chios (ca. 430 B.c.; not to be 
confused with the physician Hippocrates of Cos) proved that the ratio of 
the areas of two circles is equal to the ratio of the squares of their 
diameters (or radii). Presumably he deduced (if not rigorously proved) this 
result by inscribing in the two circles similar regular polygons, and then 
“exhausting” the areas of the circles by increasing indefinitely the number 
of sides of the polygons (Fig. 7). Since, at each stage, the ratio of the areas 
of the two inscribed polygons is equal to the ratio of the squares of the 
radii of the two circles (as a consequence of Exercise 9), it would seem to 
follow “in the limit” that the same is true of the areas of the circles. 
However, Hippocrates probably had no limit concept sufficient to “clinch” 
this essentially infinitesimal argument. 

Although it appears that the area of a circle can be approximated 
arbitrarily closely by the area of an inscribed regular polygon with 
sufficiently many sides, the area of the circle is not precisely equal to that 
of any inscribed polygon. The quadrature or “squaring of the circle’—the 
problem of finding a square with area precisely equal to that of a given 
circle—was one of the three classical problems of antiquity (together with 
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Figure 7 


the duplication of a cube and the trisection of an angle). This is an 
example of a problem, involving a distinction between approximate and 
exact computation, that is unlike any considered by the Babylonians and 
Egyptians. 


EXERCISE 10. Hippocrates applied his result on areas of circles to obtain the 
quadrature of a certain “lune.” Consider a semicircle circumscribed about an 
isosceles right triangle ABC (Fig. 8). Let ADBE be a circular segment on the base 
(hypotenuse) that is similar to the circular segments on the legs of the right triangle. 
Use the fact that similar circular segments are in area as the squares of their bases 
(why?), and the Pythagorean theorem applied to the right triangle ABC, to show 
that the area of the lune ADBC between the circular arcs is equal to the area of the 
triangle ABC, and hence to half of the area of the square on AB. 


According to the introduction to Archimedes’ treatise The Method, 
Democritus (ca. 460 B.c.—ca. 370 B.C.) was the first to discover the fact that 
the volume of a pyramid (or cone) is one-third that of a prism (or cylinder) 
with the same base and height, but he did not rigorously prove it. A 
possible indication of Democritus’ approach is indicated by the following 
question attributed to him by Plutarch (quoted by van der Waerden [12], 
p. 138): 


If a cone 1s cut by surfaces [i.e. planes] parallel to the base, then how are 
the sections, equal or unequal? If they were unequal then [i.e. thinking of 


C 


Zo NN 


A E B 
Figure 8 


Early Greek Geometry 9 


the slices as cylinders] the cone would have the shape of a staircase; but if 
they were equal, then all sections will be equal, and the cone will look like 
a cylinder, made up of equal circles; but this is entirely nonsensical. 


Here Democritus is thinking of a solid as being composed of sections 
parallel to its base. From this idea it is plausible to conclude that two 
solids composed of equal parallel sections at equal distances from their 
bases should have equal volumes (Fig. 9). This fact was exploited exten- 
sively by Cavalieri in the early seventeenth century, and now bears his 
name. It implies that triangular pyramids with the same height and bases 
of equal areas will have equal volumes. 


EXeErCIsE 11. If the bases of the two pyramids in Figure 9 have equal areas, why 
does it follow that corresponding sections parallel to their bases have equal areas? 


Given a triangular pyramid ABCE, it can be “completed” to form a 
prism of the same base and height (Fig. 10). But then the pyramids A BCE, 
DEFC, ADEC have equal volumes, because the first and second have 
equal bases and heights, as do the first and third. Consequently the volume 
of the pyramid ABCE is one-third that of the prism. Since any pyramid 
with polygonal base can be dissected into triangular pyramids, the same 
result obtains for arbitrary polygonal pyramids. 


D F 
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Just as Hippocrates exhausted a circle with inscribed regular polygons, a 
circular cone can similarly be exhausted with pyramids over regular 
polygons inscribed in its circular base. It seems to follow “in the limit” that 
the volume of the cone will be one-third that of the cylinder with the same 
base and height. 

We will see that these infinitesimal plausibility arguments of the late 
fifth century B.c. were, about a half century later, converted into rigorous 
proofs by Eudoxus. 


Incommensurable Magnitudes and 
Geometric Algebra 


The Pythagorean geometry of the fifth century B.c. was based on the 
discrete number concept and theory of proportions discussed in the previ- 
ous section. During the latter part of that century it was discovered that 
there exist pairs of line segments, such as the edge and diagonal of a 
square, that are not commensurable—they cannot be subdivided as in- 
tegral multiples of segments of the same length, and hence the ratio of 
their lengths is not equal to the ratio of two integers. For example, the 
Pythagorean theorem says that the square on the diagonal of the unit 
square has area 2, whereas (in modern terms) V2 is not a rational 
number. The chronology of this discovery is discussed in detail by 
Knorr [6], Chapter II. 

The existence of incommensurable geometric magnitudes (lengths, areas, 
volumes) necessitated a thorough reexamination and recasting of the 
foundations of mathematics, a task that occupied much of the fourth 
century B.c. During this period Greek algebra and geometry assumed the 
highly organized and rigorously deductive form that is set forth in the 13 
books of the Elements that Euclid wrote about 300 B.c. This systematic 
exposition of the Greek mathematical accomplishments of the preceding 
three centuries is the earliest major Greek mathematical text that is now 
available to us (due perhaps to the extent to which the Elements subsumed 
previous expositions). 

Today we simply say that V2 is an irrational number. However, for the 
Greeks, the discovery of incommensurability meant that there existed 
geometric magnitudes that could not be measured by numbers! For, as we 
have seen, their conception of numbers as integers alone was discrete in 
character, whereas the phenomenon of incommensurable lengths implied 
that geometric magnitudes have some sort of inherently (and unavoidable) 
continuous character. It followed that geometric magnitudes could not be 
manipulated without hesitation in algebraic computations just as though 
they were numbers. Although it was obvious that lengths or areas could be 
added by taking unions of sets, what, for example, would be meant by the 
product or quotient of two lengths or areas? 
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The Greek answer to such fundamental questions is presented in the 
geometric algebra of Books II and VI of Euclid’s Elements. The product of 
two lengths a and 5b is not a third length, but rather the area of a rectangle 
with sides a and b. Algebraic identities, such as a(b+c)=ab+ac and 
(a+ bP =a*+2ab+b’, are interpreted as the geometric propositions 
whose proofs are indicated in Figure 11. 

Whereas such a simple equation as x* = 2 has no solution in the domain 
of (Greek, rational) numbers, the equation x? = ab where a and b are given 
lengths can be solved geometrically by constructing a square with edge x 
whose area is equal to that of the rectangle with sides a and b. This is the 
real point (not always understood) to the “ruler and compass” construc- 
tions of the Elements—the solution of algebraic equations in terms of 
geometric magnitudes. 


EXERCISE 12. If x is the chord in Figure 12 of a semicircle of diameter a + 6, apply 
the Pythagorean theorem to show that x* = ab. 


The principal Greek technique for the geometric solution of algebraic 
equations was based on the “application of areas.” For example, given a 
segment AB of length a, the construction in Proposition I.44 (Prop. 44 of 
Book I) of the Elements, of a rectangle with base AB and area equal to that 
of a given square of edge b (Fig. 13), provides a solution of the equation 
ax = b*, This corresponds to geometric division, and we say that the given 
area b* has been applied to the given segment AB. 

Proposition VI.28 of the Elements shows how to apply a given area b” to 
a given segment of length a, but “deficient” by a square. That is, a 
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rectangle ABCD of area b* is constructed with its base lying along the 
given segment AP (of length a), but falling short (of the rectangle on the 
whole segment AP) by a square (CDPQ). This construction (Fig. 14) 
provides a geometric solution of the quadratic equation ax — x” = b?. 


EXERCISE 13. Proposition V1.29 of Euclid’s Elements shows how to apply a given 
area b” to a line segment of length a, but “in excess” by a square. That is, the base 
AB of the constructed rectangle with area b” extends beyond the given segment AP 
of length a, with the “excess” part of this rectangle being a square. Draw the 
indicated figure, and interpret the construction as a geometric solution of the 
quadratic equation ax + x? = b?. 


The Greeks used these admittedly cumbersome techniques of geometric 
algebra to handle with power and precision the staple fare of today’s high 
school algebra, but without assuming the existence of irrational numbers. 
They were well aware of the existence of geometric magnitudes that we call 
“irrational,” but simply did not think of them as numbers. This was not a 
lack of sophistication on their part, but rather a direct result of their 
unyielding insistence on logical rigor. In this connection, it is instructive to 
examine Book X of Euclid’s Elements, which devotes 115 propositions and 
over 250 pages (in Heath’s annotated translation [2]) to a comprehensive 
classification of irrational magnitudes of the forms a+ Vb, Vatvoe, 


VatVb , and VVa+Vb , where a and b are commensurable 
lengths. 


Eudoxus and Geometric Proportions 


Any two line segments can be compared (by ruler and compass methods if 
one insists) to determine which has the greater length, and two lengths can 
be added by placing two line segments end-to-end to form a third one. The 
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application of areas technique made possible the same operations with 
areas, because any rectilinear plane figure could be transformed into a 
rectangle with the same area and with a preassigned height. The areas of 
two rectangles with the same height could then be compared by comparing 
the lengths of their bases, and could be added by placing the two 
rectangles side-by-side to form a third one. By repeated addition, a 
geometric magnitude (length, area, or volume) can be multiplied by a 
positive integer. | 

However, the discovery of incommensurables made the Pythagorean 
theory of integral proportions useless for the comparison of ratios of 
geometric magnitudes, and thereby invalidated those geometric proofs that 
had utilized proportionality concepts. This crisis in the foundations of 
geometry was resolved by Eudoxus of Cnidus (408?-355? B.c.), a student 
at Plato’s Academy in Athens who became the greatest mathematician of 
the fourth century B.c. 

The key to Eudoxus’ accomplishment was (as often happens in mathe- 
matics) the proper formulation of a definition—%in this case, the definition 
of proportionality of ratios of geometric magnitudes. Let a and Db be 
geometric magnitudes of the same type (both lengths or areas or volumes). 
Let c and d be a second pair of geometric magnitudes, both of the same 
type (but not necessarily the same type as the first pair). Then Eudoxus 
defines the ratios a : b and c: d to be proportional, a: b=c : d, provided 
that, given any two positive integers m and n, it follows that either 


na >mb and ne > md, (1) 
or 

na = mb and nc = md, (2) 
or 

na<mb and ne < md. (3) 


EXERCISE 14. Show that Eudoxus’ definition generalizes the familiar notion of 
proportionality (or equality) of ratios of integers. In particular, if a, b,c, d are 
integers such that a/b=c/d, and m and n are two positive integers, show that (1), 
(2), or (3) holds, depending on whether m/n is less than, equal to, or greater than 
afb=c/d. 


Thus Eudoxus’ definition of proportionality for geometric ratios is 
simply a necessarily ponderous way of saying what is essentially obvious in 
the case of proportional ratios of numbers. In addition, it may be noted 
that, given incommensurable magnitudes a and b, this definition effectively 
splits the field of rational numbers m/n into two disjoint sets: the set L of 
those for which (1) holds, or m : n<a: b, and the set U of those for which 
(3) holds, or m:n>a: b. A separation of the rational numbers into two 
disjoint subsets L and U, such that every element of Z is less than every 
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element of U, is now called a “Dedekind cut,” after Richard Dedekind, 
who in the nineteenth century defined a real number to be precisely such a 
“cut” of the rational numbers. Dedekind thereby established a firm foun- 
dation for the real number system by retracing some of Eudoxus’ steps of 
over two thousand years earlier. 

The general theory of proportionality that Eudoxus erected on the basis 
of the above definition is presented in Book V of Euclid’s Elements. A 
critical assumption is innocuously included in Definition 4 of Book V, 
which states that two geometric magnitudes a and b “are said to have a 
ratio to one another which are capable, when multiplied, of exceeding one 
another,” that is, if there exists an integer n such that na>b. The 
assumption that, given two comparable geometric magnitudes a and 3, 
there exists an integer n such that na >b, was first stated explicitly as an 
axiom by Archimedes, with whose name it is therefore usually associated. 
We prefer, however, to call it here the “axiom of Eudoxus.” 

The critical role of this axiom of Eudoxus is illustrated by the proof that 


a:c=b:c implies a= b. (4) 
Suppose to the contrary that a >b. Then there exists an integer n such that 
n(a—b)>c. (5) 

Let mc be the smallest multiple of c that exceeds nb, so 
mc > nb > (m-—l1)ec. (6) 


Addition of (5) and (6) then gives 
na >me, while nb < mc, 
which contradicts the definition of the proportionality a:c=b:c. It 
follows that a= b, as desired. 
As a further example of the extreme care with which Eudoxus framed 
his theory of proportions, let us apply (4) to show that 
a:b=c:d implies ad = be. (7) 
First note that 
a:b=ad: bd 
because na >mb implies nad >mbd, etc. Similarly 
c:d= bc: bd, 
so it follows that 


ad : bd = bc: bd, 
which by (4) implies that ad = bc, as desired. 
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EXERCISE 15. (Euclid V.16) Show that a : b=c : dimpliesa:c=b: d. Hints: First 
apply the definition of proportionality to show that 


a:b=c:d implies na: nb = mc: md 
for any two integers m and n. Then apply (4) and/or its proof to show that 


na>mc, na=mc, na<me imply nb >md, nb= md, nb <md respectively. Finally 
apply the definition of proportionality again. 


EXERCISE 16. Consider two rectangles with bases a and b, areas A and B, that have 
the same height. Apply Eudoxus’ definition of proportionality to show that A : B= 
a:b. 


The proofs of (4) and (7) above indicate the manner in which the 
“usual” properties of proportions are demonstrated in Book V of the 
Elements, to an extent that enabled the Greeks to work with ratios of 
geometric magnitudes in much the same way, and to the same ends, that 
we today carry out arithmetical computations with rea] numbers. On this 
basis Eudoxus proceeded to give rigorous proofs of the results of 
Hippocrates and Democritus on areas of circles and volumes of pyramids 
and cones. 

These area and volume computations form the content of Book XII of 
Euclid’s Elements, and of the remainder of this chapter. In order to spare 
the reader a heavy burden of geometric algebra and Eudoxian proportions, 
our exposition will make free use of real numbers and modern algebraic 
notation. However, in order to preserve the original flavor and spirit as 
carefully as possible, we will follow closely both the geometrical construc- 
tions and the logical sequence of the proofs presented by Euclid. 

Before proceeding in this fashion, however, it may be instructive to 
interpret in quite modern terms the Greek view of geometric magnitudes, 
taking the case of area of plane figures as an example. Say that two 
polygonal figures “have the same area” if by application of areas tech- 
niques they can be transformed to the same rectangle. This is an equiva- 
lence relation that separates the class of all polygonal plane figures into a 
set @ of equivalence classes. Given a polygonal figure P, denote by 
a(P)€@ the equivalence class containing P, and call a(P) the area of P. 
Then the set @ of areas is what we might call a “Eudoxian semigroup”— 
an ordered commutative semigroup satisfying the axiom of Eudoxus. That 
is, given a, b, cE @, it follows that 


1. (Associativity) a+ (b+c)=(atb)+c 

2. (Commutativity) a+b=b+a 

3. a>b implies a+c>b+c 

4. There exists an integer n such that na>b. 


This interpretation emphasizes the fact that it is not necessary to think of 
areas as numbers (as the Greeks did not). 


16 Area, Number, and Limit Concepts in Antiquity 


Area and the Method of Exhaustion 


The Greeks assumed on intuitive grounds that simple curvilinear figures, 
such as circles or ellipses, have areas that are geometric magnitudes of the 
same type as areas of polygonal figures, and that these areas enjoy the 
following two natural properties. 


(i) (Monotonicity) If S is contained in 7, then a(S) <a(T). 
(ii) (Additivity) If R is the union of the non-overlapping figures S and 7, 
then a(R) = a(S) + a(T). 


Given a curvilinear figure S, they attempted to determine its area a(S) 
by means of a sequence P,, P,, P;,..., of polygons that fill up or 
“exhaust” S, analogous to Hippocrates’ sequence of regular polygons 
inscribed in a circle. The so-called method of exhaustion was devised, 
apparently by Eudoxus, to provide a rigorous alternative to merely taking 
a vague and unexplained limit of a(P,) as noo. Indeed, the Greeks 
studiously avoided “taking the limit” explicitly, and this virtual “horror of 
the infinite” is probably responsible for the logical clarity of the method of 
exhaustion. 

In any event, the crux of the matter consists of showing that the area 
a(S — P,,), of the difference between the figure S and the inscribed polygon 
P,, can be made as small as desired by choosing n sufficiently large. For 
this purpose the following consequence (Euclid X.1) of the Archimedes- 
Eudoxus axiom is repeatedly applied. 


Two unequal magnitudes being set out, if from the greater there be 
subtracted a magnitude greater than its half, and from that which is left a 
magnitude greater than its half, and if this process be repeated continu- 
ally, there will be left some magnitude which will be less than the lesser 
magnitude set out.. 


This result, which we will call “Eudoxus’ principle,” may be phrased as 
follows. Let M, and e« be the two given magnitudes, and 
M,, M,, M;,..., a sequence such that M,<}M), M,<3M,, M,<3M,, 
etc. Then we want to conclude that M, <e for some n. To see that this is 
so, choose an integer N such that 


(N+1)e > Mg. 
Then e€ is at most half of (V+ l)e, so it follows that 
Ne >$M,>M,. 
Similarly, « is at most half of Ne, so 
(N—l)e > $M, > MQ. 
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Proceeding in this way, at each step subtracting « (which is at most half) 
from the left-hand-side and halving the right-hand-side, we arrive in N 
steps at the desired inequality 


e > M,. CJ 


EXERCISE 17. Conclude from Eudoxus’ principle that, if M, «, and r < + are given 
positive numbers, then Mr” <e for n sufficiently large. Is it necessary that 7 be at 
most 3? 


We first apply Eudoxus’ principle to describe precisely the manner in 
which the area of a circle can be exhausted by means of inscribed 
polygons. 


Given a circle C and a number « >0, there exists a regular polygon P 
inscribed in C such that 


a(C) — a(P) <e. (8) 


Proor. We start with a square P,)= EFGH inscribed in the circle C, and 
write M,=a(C)—a(P,). Doubling the number of sides, we obtain a 
regular octagon P, inscribed in C (Fig. 15). 
Continuing in this fashion, we obtain a sequence Pp, P,, P,,..., 
P,,..., with P, having 2"*? sides. Writing 
M, = a(C)— a(P,), 
we want to show that 


M,, as M41 ee 3 M,,. (9) 


Figure 15 
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It will then follow from Eudoxus’ principle that M, <e for v sufficiently 
large, and we will be finished. 

The proof of (9) is essentially the same for all n, so we consider the case 
n=O illustrated by Fig. 15. Then 

My — M, = a(P,) — a(Po) 
4a(AEFK) 
= 2a( EFF’ E’) 
Fos 

> 2a(EKF) 

1 
=>" 4a(EKF ) 


= 3[4(C) — a(Py)] 
My —M, > 7Mp 


“a_s~ 
where we denote by EF<K the circular segment cut off the circle by the side 
EF of the square Pp. In the general case, we obtain 


M, ~ M, +1 oa a( P41) 7 a(P,,) 
> z[a(C)—a(P,)] = 7M, 
where a(C)—a(P,,) is the sum of the areas of the 2”*' circular segments 


cut off by the edges of P,. oO 


The above lemma provides the basis for a rigorous proof of the theorem 
on areas of circles (Euclid XII.2). 


If C, and C, are circles with radii r,; and r,, then 


a(C;) = nm 
a(C3) r3 


(10) 


ProoFr. If A, =a(C,), A, = a(C,), then either 


The proof is a double reductio ad absurdum argument, characteristic of 
Greek geometry, in which we show that the assumption of either of the 
inequalities leads to a contradiction. 
Suppose first that 
2 2 
“1 4, or A,> sae 
2 95 ri 


= §, 


and let e=A,— S$ >0O. Then, by the lemma, there exists a polygon P, 
inscribed in C, such that 
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so a(P,)>S. But 
a(P;) _ ri A, 


a(P,) 43 S° 


(Exercise 9) 


a(P2) a(P,) a(P,) 


so S >a(P,), which is a contradiction. Hence the assumption A,/A,< 
r?/r3 is false. 

By interchanging the roles of the two circles, we find similarly that the 
assumption 


A z A : 
A, Pr A, rf 
is also false. We therefore conclude that (10) holds, as desired. Cc] 


If we rewrite (10) as 


2. 2:..? (11) 


and denote by 7 the common value of these two ratios, then we obtain the 
familiar formula A =r’ for the area of a circle. In fact, however, the 
Greeks could not do this, because for them (11) was a proportion between 
ratios of areas, rather than a numerical equality. Hence the number 7 does 
not appear in this connection in Greek mathematics. 


EXERCISE 18. Apply the lemma on the exhaustion of a circle by inscribed polygons, 
together with the fact that the volume of a prism is the product of its height and the 
area of its base, to give a double reductio ad absurdum proof that the volume of a 
circular cylinder is equal to the product of its height and the area of its base. Given 
a polygon P inscribed in the base circle, consider the prism Q with base P and 
height equal to that of the cylinder. Then the cylinder can be exhausted by prisms 
like Q. 


Volumes of Cones and Pyramids 


If P is either a triangular pyramid or a circular cone, then its volume is 
given by 


v(P) =+Ah, (12) 


where A is its height and A the area of its base. According to Archimedes, 
the two results described by this formula were discovered by Democritus, 
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Figure 16 


but were first proved by Eudoxus. In this section we discuss their treat- 
ment by Euclid in Book XII of the Elements. 

The calculation of the volume of a pyramid is based on the dissection of 
an arbitrary pyramid with triangular base into two prisms and two similar 
pyramids, as indicated in Figure 16. The points E, F, G, K, L, M are the 
midpoints of the six edges of the pyramid OBCD. It is clear that the 
pyramids OEFG and EBKM are similar to OBCD and are congruent to 
each other. The crucial fact about this dissection is that the sum of the 
volumes of the two prisms 


EKMFCL and MLDEFG 
is greater than half the volume of the original pyramid OBCD. This is true 
because 
v(OEFG) = v( FKCL) < v(EKMFCL) 
and 
v(EBKM) = v(GMLD) < v(MLDEFG). 


If we denote by A the height and by A the area of the base BCD of the 
pyramid OBCD, then 


v(MLDEFG) = 1 Ah 


because the height of this prism is 5A and the area of its base MLD is iA. 
Also, 


v(EKMFCL) = } Ah 


because the area of the parallelogram KCLM is 3A, and the prism 
EKMFCL is half of a parallelepiped with base KCLM and height 5h. 
Consequently the sum of the volumes of the two smaller prisms is 3 Ah. 
Now let us similarly dissect each of the two pyramids OEFG and EBKM 
into two smaller pyramids and two prisms. The sum of the volumes of the 
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four resulting smaller prisms is then greater than half of the sum of the 
volumes of the pyramids OEFG and EBKM. Because these two latter 
pyramids both have height 4/2 and base area A /4, it follows that the sum 
of the volumes of the four smaller prisms is 
me eee i 
4 4 2 gq 
After n steps of this sort, we obtain an n-step-dissection of the original 
pyramid. At the kth step we have 2* subdivided small pyramids, and hence 
2* pairs of smaller prisms. Each of the 2* small pyramids has height h/2* 
and base area A /4*, so the sum of the volumes of the 2* pairs of smaller 
prisms is 
1 AoA Ah 


"4 gk Ok qk+1 


Finally, if P denotes the union of all the prisms obtained in all the steps 
of this n-step-dissection, it follows that 


1 1 ] 
o(P) = An g++ me ord (13) 
Furthermore, because at each step the sum of the volumes of the prisms is 
greater than half the sum of the pyramids obtained in the previous step, 
Eudoxus’ principle implies that, given « > 0, 


V —v(P) <e (14) 


if n is sufficiently large, and V = v(OBCD). This construction is the basis 
for Euclid’s proof of Proposition XII.5. 


Given two triangular pyramids with the same height and with base areas A, 
and A,, the ratio of their volumes V, and V, is equal to that of their base 
areas, 


ae (15) 


Proor. The demonstration of (15) is a double reductio ad absurdum 
argument almost identical to that used in the proof of the theorem on areas 
of circles. Suppose first that 

V, < A, VA, 


eke. —, V. =§ 
eae Bi or V,> A, Ss 


and let e= V,— S. Denote by P, the union of all the prisms obtained in an 
n-step-dissection of the second pyramid, with n sufficiently large that 
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so u(P,) >S. It then follows from (13) that, if P, is the similar union of 
prisms obtained in an n-step-dissection of the first pyramid, then 


Hence 
ee 
v(P) o(P,) 


> | 


because P, is properly contained in the first pyramid. But S >v(P,) is a 
contradiction, so the assumption V,/V,<A,/A,j is false. 

By interchanging the roles of the two pyramids, we find that the 
assumption V,/V,>A,/A, is also false. It therefore follows that V,/V, 
= A,/A,, as desired. _ 


We have previously seen that the formula V= +Ah, for the volume of a 
triangular pyramid, follows from the fact that two pyramids with equal 
heights and base areas must have the same volumes. For any given 
pyramid is one of three pyramids with equal volumes, whose union is a 
prism with height and base area equal to those of the given pyramid (see 
Figure 10). We assume here (as in the above construction) the elementary 
fact that the volume of a prism is the product of its height and its base 
area. 

Alternatively, it is interesting to derive the formula V= +Ah directly by 
using the sum of the geometric series 


5s L_4 
aay ae Ad 
Given e > 0, we see from (13) that 
1 1 ] 
V AW at at ta) <e 


if ” is sufficiently large. It follows that the volume of the pyramid is 


Although the Greeks knew how to sum a finite geometric progression, they 
used reductio ad absurdum arguments to avoid the formal summation of an 
infinite series. 


EXERCISE 19. Show that the volume formula V={Ah holds for a pyramid whose 
base is an arbitrary convex polygon (that can be dissected into triangles, thereby 
dissecting the pyramid into triangular pyramids for which the formula is already 
known). 
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EXERCISE 20. Show that the ratio of the volumes of two similar pyramids is equal to 
the ratio of the cubes of corresponding edges. 


In Proposition XII.10 Euclid uses inscribed pyramids to exhaust a 
circular cone so as to establish the volume formula V = + Ah for cones. To 
outline this proof, let T be a cone with vertex O, base circle C, and height 
h. Let 


Po, Py, Py... 3 Pr - 


n 


be the sequence of inscribed regular polygons previously used to exhaust 
the circle, with P, having 2”*? sides. If T,, denotes the pyramid with vertex 


O and base P,, then 
dre rae Graeme a 


meee? 


is a sequence of pyramids inscribed in the cone 7, and v(7,)= +a(P,,)h, 
where A is the height of T. 

Recall that we proved that, if M,=a(C)—a(P,), then M,— M,,,> 
+M,. By joining every polygon involved with the vertex O, we can 
similarly prove that, if 


M,, = (T) — o(T,); 
then 
M,—M,4,>4M,. 
Eudoxus’ principle therefore implies that, given « > 0, 
M,= v(T) —v(T,) <e (16) 


if n is sufficiently large. Also, if Q is the cylinder with base C and height h, 
and Q, is the inscribed prism with base P,, and height h, then 


v(Q) — o(Q,) <« 
for n sufficiently large (why?). 
We are now ready for the reductio ad absurdum proof that 


v(T) = t0(Q) = 1 Ah. (17) 


Otherwise, either v(7) <40(Q) or v(T) > $v(Q). 
Assuming that 0(T) <4$v(Q), choose n sufficiently large that 


o(Q) — o(Q,) < o(Q) — 3v(T). 


Then v(Q,) >3v(T) >3v(T,) because the pyramid T, is inscribed in the 
cone T. But the conclusion that v(T,) <40(Q,) is a contradiction, because 
the pyramid 7, and the prism Q, have the same base and height, so we 
know (Exercise 19) that v(T,) = 4 0v(Q,). 
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Assuming that v(T) > };v(Q), choose n sufficiently large that 


vo(T) — o(T,) < o(T) — 50(Q). 


Then v(T,) >40(Q)>40v(Q,) because the prism Q, is inscribed in the 
cylinder Q. But this is a contradiction for the same reason as before, so we 
conclude that o(T) = 40(Q) as desired. 0 


Volumes of Spheres 


The final result in Book XII of the Elements is Proposition 18, to the effect 
that the volume of a sphere is proportional to the cube of its radius. Euclid 
proves this in the following form. 


If S, and S, are two spheres with radii r, and r, and volumes V, and V,, 
then 


3 
V, ry 


pane 18 
Vp (18) 


As a preliminary lemma (XII.17) he shows that, given two concentric 
spheres S and S’ with S’ interior to S, there exists a polyhedral solid P 
inscribed in S that contains S’ in its interior. The polyhedral solid P is a 
union of finitely many pyramids, each of which has the common center O 
of the two spheres as its vertex, with its base being a polygon inscribed in 
the outer sphere S (Fig. 17). 

In his proof of Proposition 18, Euclid assumes without proof that, given 
a sphere S with volume V and V’< VJ, there exists a concentric sphere S’ 
with vo(S’)= V’. We will repair this minor gap by using the slightly simpler 
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S2 
Figure 18 
fact that there exists a concentric sphere S’ with V’<v(S’) <V (Exercise 


21 below). 
Assuming that V,/V.<r}/75, let 


and let S be a sphere interior to and concentric with S, (see Fig. 18) such 


that é 
. 
(S)=V>V,-e= 4). (19) 
ry 


Now let P, be a polyhedral solid inscribed in S, that contains S in its 
interior. If P, is the similar polyhedral solid inscribed in Sj, then 


ere Tanah (20) 


by Exercise 20, because P, and P, are made up of pairwise similar 
pyramids with corresponding edges r, and r,. Hence 


t r3V, 
Vy>V>-5 
r 
by (19), so 
won lv 
a ns 


by (20). Thus v(S,)= V, << V{=v(P,). But this is a contradiction, because 
S, contains P). 

Interchanging the roles of the two spheres S, and S,, the assumption 
that V,/V.>r?/r2 leads similarly to a contradiction. Consequently we 
conclude that V, /V,=r?/r3, as desired. Oo 
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EXERCISE 21. Let S be a sphere of radius r and equatorial circle C, and e > 0. Show 
as follows, without using the formula for the volume of a sphere, that there is a 
sphere S with v(S)— ¢«<v(S) <v(S). First choose 6 >0 such that 


a(r+6) — a(r—6) = A4nré < aot 


If r—8<7 <r, then the annular ring A bounded by C and the concentric circle C 


of radius 7 can be covered by non-overlapping rectangles R,, R2,..., R, such that 
n 
€ 
pees ? 
Y a(R) < ze. (Why?) 


i=l 

If 7; is the cylinder-with-hole obtained by revolving the rectangle R; about a 
horizontal axis through the center of the circles, then the sets 7,, T>,..., T,, cover 
the spherical shell between the sphere S and the sphere S of radius 7. Now apply 
the formula for the volume of a cylinder to show that 


n 


> 0(T)) <e. 


i=] 


Why does this imply that o(S)— € <o(S)<v(S)? 


Let S, be an arbitrary sphere with radius r and volume V, and denote by 
a the volume of a sphere S, with unit radius. Then Equation (18) yields the 
volume formula 


V = ar’, (21) 


according to which the volume of a sphere is proportional to the cube of its 
radius. There is no indication that Euclid or his predecessors knew the 
relationship between a and 7; it was Archimedes who discovered that 
a = 4 /3 (see Chapter 2). 


It is instructive to examine the common pattern of the proofs of the five 
basic results from Euclid XII that we have discussed in this and the 
preceding two sections. Each of these theorems compares the areas or 
volumes of two sets A and B that are either 


. two circles, 

. two cylinders with the same height, 

. two pyramids with the same height, 

. acone and a cylinder with the same height, or 
. two spheres. 
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In particular, we want to prove that 
v(B) = kv(A), (22) 


where the proportionality constant k is equal in the five cases, respectively, 
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. the ratio of the squares of their radii. 
. the ratio of the squares of their radi. 
. the ratio of their base areas. 

. one-third. 

. the ratio of the cubes of their radii. 


tn BB W WN — 


The first step in each proof is the construction of two sequences of 
polygonal or polyhedral figures, {P,}{° inscribed in A and {Q,}>° in- 
scribed in B, such that 


v(Q,) = kv(P,) 


for all n. Eudoxus’ principle is applied to the construction to show that, 
given € > 0, 


v(A) — v0 P,) <e and v(B)— v0(Q@,) <e 


if n is sufficiently large. 
In terms of the modern limit concept we would complete the proof by 
simply noting that 


v(B) 


lim v(Q,,) 
lim. kv(P,,) 


k lim. v(P,) 
v(B) = kv(A). 


In essence, the Greeks avoided this explicit use of limits by completing 
the proof by means of the double reductio ad absurdum argument. Assum- 
ing that 


v(B) > kv(A), 


and writing € = v(B) — kv(A), we choose inscribed figures P in A and Q in 
B such that 


v(Q) = kvo(P) and v(Q) > v(B)—e = kv(A). 


But this is a contradiction, since v(P)<v(A) because A contains P. 
Reversing the roles of A and B, the assumption v(A)>v(B)/k leads 
similarly to a contradiction, so we must conclude that v(B)=kwv(A) as 
desired. CO 


A logically complete indirect proof is thereby obtained without explicit 
reference to limits. The mystery which the Greeks attached to the infinite 
and, in particular, to what we call the limit concept, is absorbed (if not 
obviated) in Eudoxus’ principle. In this connection, Aristotle remarked 
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that mathematicians make no use of magnitudes infinitely large or small, 
but content themselves with magnitudes that can be made as large or as 
small as they please (quoted by Heath [3], p. 272). 


References 


[1] C. B. Boyer, A History of Mathematics. New York: Wiley, 1968. 
[2] T. L. Heath, The Thirteen Books of Euclid’s Elements. Cambridge University 
Press, 1908. (Dover reprint, 1956). 
[3] T. L. Heath, A History of Greek Mathematics, Vol. I. Oxford University Press, 
1921. 
[4] T. L. Heath, Greek geometry with special reference to infinitesimals. Math 
Gaz 11, 248-259, 1922-23. 
[5] J. Hjelmslev, Eudoxus’ axiom and Archimedes’ lemma. Centaurus 1, 2-11, 
1950. 
[6] W. R. Knorr, The Evolution of the Euclidean Elements. Dord echt: Reidel, 
1975. 
[7] G. R. Morrow, Proclus, A Commentary on the First Book of Euclid’s Elements. 
Princeton, NJ: Princeton University Press, 1970. 
[8] O. Neugebauer, The Exact Sciences in Antiquity. Brown University Press, 
1957, 2nd ed. 
[9] A. Seidenberg, On the area of a semi-circle. Arch Hist Exact Sci 9, 171-211, 
1972-73. 
[10] A. J. E. M. Smeur, On the value equivalent to 7 in ancient mathematical texts. 
Arch Hist Exact Sci 6, 249-270, 1969-70. 
[11] I. Thomas, Greek Mathematical Works, 2 vols. Cambridge, MA: Harvard 
University Press, 1951. 
[12] B. L. van der Waerden, Science Awakening. Oxford University Press, 1961, 
2nd ed. 
[13] K. von Fritz, The discovery of incommensurability by Hippasus of Metapon- 
tum, Ann Math (2) 46, 242-264, 1945. 


Archimedes 


Introduction 


Archimedes of Syracuse (287—212 B.c.) was the greatest mathematician of 
ancient times, and twenty-two centuries have not diminished the brilliance 
or importance of his work. Another mathematician of comparable power 
and creativity was not seen before Newton in the seventeenth century, nor 
one with similar clarity and elegance of mathematical thought before 
Gauss in the nineteenth century. 

He was famous in his own time for his mechanical inventions—the 
so-called Archimedean screw for pumping water, lever and pulley devices 
(“give me a place to stand and I can move the earth”), a planetarium that 
duplicated the motions of heavenly bodies with such accuracy as to show 
eclipses of the sun and moon, machines of war that terrified Roman 
soldiers in the siege of Syracuse (which, however, resulted in Archimedes’ 
death). For Archimedes himself these inventions were merely the “diver- 
sions of geometry at play,” and the writings that he left behind are devoted 
entirely to mathematical investigations. These treatises have been de- 
scribed by Heath (editor of the standard English edition of the works of 
Archimedes [5]) as 


without exception, models of mathematical exposition; the gradual un- 
folding of the plan of attack, the masterly ordering of the propositions, 
the stern elimination of everything not immediately relevant, the perfect 
finish of the whole, combine to produce a deep impression, almost a 
feeling of awe, in the mind of the reader. There is here, as in all the great 
Greek mathematical masterpieces, no hint as to the kind of analysis by 
which the results were first arrived at; for it is clear that they were not 
discovered by the steps which lead up to them im the finished treatise. If 
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the geometrical treatises had stood alone, Archimedes might seem, as 
Wallis said, “‘as it were of set purpose to have covered up the traces of his 
investigations, as if he has grudged posterity the secret of his method of 
inquiry, while he wished to extort from them assent to his results” ({7], p. 
281). 


In this chapter we discuss those of Archimedes’ extant works that deal 
primarily with area, length, and volume computations, in the following 
order: 


. Measurement of a Circle 

. Quadrature of the Parabola 
. On the Sphere and Cylinder 
. On Spirals 

On Conoids and Spheroids 

. The Method 


NOnh WN 


The first five of these develop the method of exhaustion into a technique of 
remarkable power which Archimedes applied to a wide range of problems 
that today are typical applications of the integral calculus, and which 
provided the starting point for the modern development of the calculus. 
Treatise (6), which was unknown until its rediscovery in 1906, describes the 
heuristic infinitesimal method by which Archimedes first discovered many 
of his results. 

Throughout this chapter, as in the later sections of Chapter 1, modern 
algebraic symbolism is used to palatably translate verbal statements and 
arguments that were originally presented in the cumbersome language of 
classical geometric algebra. Recall that Greek mathematics did not repre- 
sent geometric magnitudes in terms of real numbers. Consequently, in 
order to specify the area of a given figure represented geometrically as the 
product of two linear factors, the Greek geometer had to introduce a plane 
figure with area equal to that of the given figure. For example, Archimedes 
would say that the surface area (A) of a right circular cylinder (excluding 
its bases) is equal to the area of a circle whose radius is the mean 
proportional between the height (h) of the cylinder and the diameter (d) of 
its base. For us, A is the product of the height and the circumference of the 
base, and we simply write A = zdh. 

It must be recognized that this concession to ease of understanding on 
the part of the modern reader entails a loss of certain characteristic 
features of geometric algebra that are important for a full understanding 
and appreciation of classical Greek mathematics. However, we will adhere 
closely to Archimedes’ basic geometric constructions, and thereby attempt 
to preserve those features that seem most important for an understanding 
of the historical development of the calculus. The best comprehensive 
analysis of Archimedes’ works, one that faithfully preserves the flavor of 
antiquity, is that of E. J. Dijksterhuis [3]. 
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The Measurement of a Circle 


As we saw in Chapter 1, the awareness (on some level) that the area of a 
circle is proportional to the square of its radius, A = 7,r? for some constant 
m,, dates back to earliest times. Similarly, the proportionality between a 
circle’s circumference and diameter, C=7,d for some constant 7,, is an 
ancient one. However, it is not clear when it was first realized that the two 
proportionality constants are the same, 7, = 7,= 7. In the Measurement of 
a Circle, Archimedes provided the first rigorous proof of this fact by 
showing that the area of a circle is equal to that of a triangle with base 
equal to its circumference and height equal to its radius, 


A = 5rc. (1) 


To see that (1) implies that 7,= 7, simply substitute A =7,r? and C= 
27,r. He then showed that 


BP <a < 35 (2) 


by explicitly establishing this inequality for the ratio of the circumference 
of a circle to its diameter. 

Formula (1) was certainly known before Archimedes, and it was prob- 
ably deduced by regarding the circle as the union of indefinitely many 
isosceles triangles with the center as their common vertex, and with their 
bases forming an inscribed regular polygon with each of its indefinitely 
many sides almost coinciding with a small arc of the circle. Since the 
height of each triangle will virtually equal the radius of the circle, and the 
sum of their bases will virtually equal its circumference, this picture makes 
the truth of (1) seem evident. 

This heuristic derivation supplies the motivation for Archimedes’ 
rigorous proof. In it he extends the method of exhaustion to what has been 
termed the “method of compression.” Instead of dealing only with in- 
scribed polygons, he employs both inscribed and circumscribed polygons. 
The area of the circle is then “compressed” between the areas of inscribed 
and circumscribed polygons that closely approximate the circle (Fig. 1). 
The following two exercises are preliminaries to the proof. 


EXERCISE |. Consider a circle with circumference C, and let P and Q be inscribed 
and circumscribed polygons, respectively. Show that the perimeter of P is less than 
C, while the perimeter of Q is greater than C. Use the facts that sin 0 <0 < tan 6 if 
8 is an angle less than 7/2 radians, and that a central angle of @ radians subtends 
an arc of length 7@ in a circle of radius r. 


EXERCISE 2. In Chapter 1 we saw that, given a circle with area A and « > 0, there is 
an inscribed regular polygon P with a(P) >A —.«. Show similarly that there is a 
circumscribed regular polygon Q with a(Q) <A +e. 
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The proof of (1) is a typical reductio ad absurdum argument. Assuming 
that A> =rC, let €=A— rc, and choose a regular n-sided polygon P 
inscribed in the circle such that 


a(P) >A—e€ =$r. 
If s, is the length of a side of P, and r, is the length of a perpendicular 
from the center to a side of P, then 

r,<r and vas, <C 


by Exercise 1. Because P is the union of n isosceles triangles with base s, 
and height r,, it follows that 


a(P) = n- 41,5, =41,(ns,) < $C. 
But this is a contradiction, so A is not greater than >rC. 


Assuming that A <irC, let e=}rC—A, and choose a regular n-sided 
circumscribed polygon Q such that 


a(Q) <A +e =r. 


For the purpose of subsequent computations, let 4, denote half the length 
of a side of Q. Then 
| a(Q) = n- 4r(2t,) = ¢r(2nt,) > 5rC, 


because the perimeter 2nt, of Q is greater than C (Exercise 1). This 
contradiction completes the proof of (1). O 


EXERCISE 3. Let A, and C, denote the area and perimeter, respectively, of a regular 
polygon with n sides inscribed in a circle of radius r. Show that 


, vis 7 ‘ T 
A, = nr’ sin — cos — and C, = 2nr sin —. 


Deduce that A = irC by taking the limit of A,/C, as n—>co. 
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Figure 2 


EXERCISE 4. With A, and C, as in Exercise 3, and r,, s, as in Figure 1, write 
A, =inr,S, and C, =ns,. Deduce that A =}rC without using trigonometric func- 
tions, by taking the limit of A,/C, as noo. What must you assume to be 
“obvious”? 


In order to obtain the approximation (2) for 7, Archimedes began with 
regular hexagons inscribed in and circumscribed about a circle of radius 
one. By successively doubling the number of sides he obtained pairs of 
inscribed and circumscribed regular polygons with 12, 24, 48, and 96 sides, 
and calculated their perimeters to find upper and lower bounds for 7. 

Consider first the circumscribed polygons. If +, denotes half the length of 
a side of a regular circumscribed polygon with 7 sides, then the relation- 
ship between ¢, and ¢,, is indicated in Figure 2, where O is the center of 
the circle, and OD bisects ‘angle AOC. If CP is parallel to OD, it is easily 
seen that OP = CO. Since triangles ADO and ACP are similar, it follows 
that 


AD_ _AC_ __AC 
AO AO+OP AO+OC’ 
or 


t 
Pe (3) 


1+yl + t? 


Now consider the inscribed polygons. If s, denotes the side of the 
regular inscribed polygon with n sides, the relationship between s, and S2, 
is indicated by Figure 3, where s, = BC, s,,= BD, and AD bisects angle 
BAC (Why?). It is easily checked that the triangles ABD, BPD, and APC 
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are similar. Hence 


AB_ BP 4g AC_ PC 
AD BD AD BD’ 
SO 
AB+AC _ BP+PC _ BC 
AD BD ‘BD’ 
or 
2+\4-s87 oe 
4—s3 San 


After cross-multiplying and squaring, the resulting equation yields 


3, =§ 2. (4) 


i 2+y4—s? 


EXERCISE 5. Observe from the familiar geometry of the regular hexagon that s, = 1, 
ts =1/V3. Apply formulas (3) and (4) to compute 515, S54, S4g, Sog and 
tia» tog t4g, fog TeCursively (using a hand calculator) so as to obtain 


S96 = 0.065438 and tog = 0.032737. 
Since 48596 <7 < 96tg¢, conclude that 
37 <a < 35. 


Hence 7 = 3.14, rounded off to two decimal places. 


Of course, Archimedes did not have a hand calculator available. He 
started with the approximation 


265 \/ 1351 
153 < 3 < 780 


and proceeded manually, carefully rounding down in calculating the s,’s 
and rounding up in calculating the ¢,’s, finally obtaining 
310 Z 6336 ae 14688 1 . 
71 20174 46734 7 


It is generally believed that the extant Measurement of a Circle is only a 
fragment of Archimedes’ original and more comprehensive treatment of 
the circle. In a recent article W. R. Knorr [10] argues persuasively that, to 
obtain a more accurate approximation to 7, Archimedes started with 
inscribed and circumscribed decagons (regular 10-sided polygons) and 
successively doubled sides six times to obtain inscribed and circumscribed 
regular polygons with 640 sides. 
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EXERCISE 6. Starting with the fact that the side of a regular decagon inscribed in the 
unit circle is 5;)=2 sin 18° =(V5 — 1)/2, apply formulas (3) and (4) to recursively 
calculate S49 and ftg49, carrying 8 decimal places on a hand calculator. Thence 
verify that 7 = 3.1416, rounded off to 4 decimal places. 


EXERCISE 7. Let p, and P,, denote the perimeters of the inscribed and circumscribed 
regular n-sided polygons for the unit circle. Noting that 


s, =2sin — and t, =tan—, 
n n 
show that 


ofa 2PnP, 
P2n = PnP 2 and P, ne D, + P, . 


Starting with p,=4V2 and P,=8 for inscribed and circumscribed squares, use 
these recursive formulas to calculate pg, and P.,. What bounds on 7 does this 
computation give? 


EXERCISE 8. If a, and A,, are the areas of the inscribed and circumscribed polygons 
in the previous exercise, show that 
2a>,A n 


a2, >= a,A, and A>, = AL 
n n 


The Quadrature of the Parabola 


A segment of a convex curve is a region bounded by a straight line and a 
portion of the given curve (Fig. 4). In the preface to the Quadrature of the 
Parabola, Archimedes remarks that earlier mathematicians had success- 
fully attempted to find the area of a segment of a circle or hyperbola, but 
that apparently no one had previously attempted the quadrature of a 
segment of a parabola—precisely the one that can be carried out by the 
method of exhaustion. 

The parabola was originally defined by the Greeks as a conic section. 
That is, given a circular (double) cone with vertical axis, a parabola is the 


Figure 4 
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Parabola 


Figure 5 


curve of intersection of the cone with a plane that is parallel to a 
generating element of the cone. Other positions of the plane yield ellipses 
and hyperbolas. If the plane is horizontal then the section is a circle. The 
parabola is obviously symmetric with respect to a certain straight line in 
the plane containing it; this line is called the axis of the parabola (Fig. 5). 
Given a parabolic segment with base AB (Fig. 6), the point P of the 
segment that is farthest from the base is called the vertex of the segment, 


Figure 7 
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and the (perpendicular) distance from P to AB 1s its height. Archimedes 
showed that the area of the segment is four-thirds that of the inscribed 
triangle APB. That is, the area of a segment of a parabola is 4/3 times the 
area of a triangle with the same base and height. He gave two separate 
proofs of the result; we will discuss here the second one. 

By the time of Archimedes, the following facts were known concerning 
an arbitrary parabolic segment APB. 


(a) The tangent line at P is parallel to the base AB. 

(b) The straight line through P parallel to the axis intersects the base AB 
in its midpoint M. 

(c) Every chord QQ’ parallel to the base AB is bisected by the diameter 
PM. 

(d) With the notation in Figure 7, 


PM ~ yp? " 


That is, in the pictured oblique xy-coordinate system, the equation of the 
parabola is of the form x= ky”. Archimedes quotes these facts without 
proof, referring to earlier treatises on the conics by Euclid and Aristaeus. 

There is a natural parallelogram circumscribed about a parabolic seg- 
ment APB, having AB as a side, and with its base 4A’ and top BB’ parallel 
to the diameter PM (Fig. 8). Since the area of the inscribed triangle APB is 
half that of the circumscribed parallelogram, it follows that the area of this 
triangle is more than half of the area of the parabolic segment APB. 

Now consider the two smaller parabolic segments with bases PB and 
AP; let their vertices be P, and P,, respectively (Fig. 8). In the same way 
as above, it follows that the areas of the inscribed triangles PP,B and 
A P,P are more than half of the areas of these two segments. 


B 


Figure 8 
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We have begun to exhaust the area of the original parabolic segment 
APB with inscribed polygons. The triangle APB is our first inscribed 
polygon, and AP,PP,B is the second. We continue in this way, adding at 
each step the triangles inscribed in the parabolic segments remaining from 
the previous step. Since the total area of these inscribed triangles is more 
than half that of the segments, it follows from Eudoxus’ principle that, 
given €>0, we obtain after a finite number of steps an inscribed polygon 
whose area differs from that of the segment APB by less than e. 


Now we want to show that the sum of the areas of the triangles AP,P 
and PP,B is } that of A APB. Let M, be the midpoint of BM, Y the point 
of intersection of P,M, and PB, and V the intersection with PM of the line 
through P, parallel to AB. Then 


BM? = 4M,|M’*, 
so it follows from (5) that 
PM=4PV or P,M = 3PV. 
But YM, =;PM =2PV, so that 
YM, = 2P\Y. 
It follows from this that 
a(APP,B) = 3a(APM,B) = ta(A PMB), 


applying twice the fact that the ratio of the areas of two triangles with the 
same base is equal to the ratio of their heights. Similarly 


a(AAP,P) = 4a(AAPM), 
so we find that 
a(A PP,B) + a(AAP,P) = {a(A PMB) + ia(AAPM) 
= ta(AAPB) 
as desired. 
In the same way it can be proved that the sum of the areas of the 


inscribed triangles added at each step is equal to 5 of the sum of the areas 
of the triangles added at the previous step. If we write 


a = a(AAPB), 


it follows that the polygon , obtained after n steps has area 
a a a 


Consequently, given «€>0, the area a(APB) of the parabolic segment 
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differs from the right-hand side of (6) by less than ¢ if 7 is sufficiently 


large. 
At this point Archimedes derives the elementary identity 
| | | l | 4 
M+atet CT etsy ams (7) 
This follows from the observation that 
Dee Sp Be eg 
4k = 3 4k 3. 4k 3 4k-! 2 
for then 
Gee Ged adit, oy +(a+3 , #) 
4 a3 
4 2 qr-l 3 qn-l 
1 1 | 
: 1+(4+3 a) 
a 
3 


It is tempting to simply sum the geometric series by letting n— oo in (7) 
to obtain 


] ] ] 4 
age ane = 3° 
We would then conclude that 
a(APB) = Jim, a(?,) 

1 1 1 ] 
= im a(1+3+ ee Fats : ra) 
= o(1+5+ ver 

4 4” 


a(APB) = za = = a(A APB) 
as desired. Oo 


No doubt Archimedes intuitively obtained the answer 4/3 in similar 
fashion but, rather than taking limits explicitly, he concluded the proof 
with a typical double reductio ad absurdum argument which we leave to the 
reader. 


EXERCISE 9. Supply this concluding argument, using the facts that 


(a) Given «>0, a(APB) and a(@,,) differ by less than « if m is sufficiently large, 
and | 


(b) a(P,,) = (4a/3) — (a/3) - (1/4") (from (6) and (7)). 
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If Archimedes’ result is applied to the segment bounded by the parabola 
y =x’ and the horizontal line y=1, we find that its area is 4/3, so it 
follows that the area under the parabola and over the interval 0 <x < 1 is 


+. In modern integral notation this means that 


1 
f x*dx = :- 
) 


The Area of an Ellipse 


Although Archimedes was unable to compute the area of an arbitrary 
segment of an ellipse, he did show (in On Conoids and Spheroids) that the 
area of the complete ellipse with major and minor semi-axes a and b is 


A = mab, (8) 


a pleasant generalization of the formula for the area of a circle (the circle 
of radius r being an ellipse with a= b=r). 

Archimedes’ proof of (8) is based on the following characteristic prop- 
erty of an ellipse. The circle of radius a, circumscribed about the ellipse as 
in Figure 9, is called its auxiliary circle. Given a point P on the major 
(horizontal) axis of the ellipse, let Q be the point on the ellipse and R the 
point on the circle above P. Then 


PQ b 7 
- (9) 


Figure 9 
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Figure 10 


This is obvious from the rectangular coordinates equation 


2 2 
a* &b 
for the ellipse, which gives 
PQ=y 
wg? a PR: 
a a 


To give the proof of (8), we start with an ellipse E with major and minor 
semi-axes a and 5, and with auxiliary circle C’. Let C” be a circle of radius 
r=Vab,so a(C”)= mab (see Fig. 10). We want to prove that 


a(E) = a(C’). 


Assuming that a(E)<a(C”), let P” be a regular polygon inscribed in 
C”, having its number of sides equal to a multiple of 4, and with opposite 
ends of the horizontal diameter of C” as vertices, such that 


a(P”) > a(E). (10) 
If P’ is a similar regular polygon inscribed in the auxiliary circle C’, then 


a(P") _ rt _ab_ be (11) 
a(P’) a@* @ a 

Now let P be the polygon inscribed in the ellipse E whose vertices are 
the intersections with & of the perpendiculars from the vertices of P’ to the 
horizontal axis of E. We can consider the polygons P and P’ as unions of 
corresponding pairs of triangles like Qrs and QRS, and corresponding 
pairs of trapezoids like kimn and KLMN. Now the characteristic property 
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(9) of the ellipse implies that 


so it follows that 


Consequently, by pairwise comparison of the triangles and trapezoids 
forming P with those forming P’, we see that 


a) e 


But (11) and (12) imply that a(P)=a(P”), which contradicts (10) because 
P is inscribed in E. Therefore a(C”) is not greater than a(£). 

Assuming next that a(C”) <a(£), we start with a polygon P like the one 
above inscribed in the ellipse Z, such that 


a(P) > a(C”). 
Then let P’ be the polygon inscribed in the auxiliary circle C’ whose 
vertices are the intersections with C’ of vertical lines through pairs of 
vertices of P, and let P” be the similar polygon inscribed in the circle C”. 
By the same computations as in the first case we find that 
a(P) _b_ a(P”) 


a(P’) a a(P’)’ 
But then a(P”)=a(P)>a(C”), which is a contradiction because P” is 
inscribed in C”. This completes the double reductio ad absurdum proof that 


a(E) = a(C”) = mab. UO 


In essence, Archimedes has simply given a rigorous exhaustion proof of the 
intuitively evident fact that the area of the ellipse is b/a times the area 7a” 
of its auxiliary circle, corresponding to the observation that the circle is 
transformed into the ellipse by shrinking its vertical dimension by the 
factor b/a. 


The Volume and Surface Area of a Sphere 


To the modern reader, the treatise On the Sphere and Cylinder probably 
seems the most elegant and inventive of Archimedes’ works. The author 
himself apparently concurred with this judgment, for he requested that on 
his tombstone be carved a sphere inscribed in a right circular cylinder 
whose height equals its diameter. When the Roman orator Cicero was later 


The Volume and Surface Area of a Sphere 43 


<> 


Figure 11 


serving as quaestor in Sicily, he found and restored the tomb with this 
inscription. The Romans had so little interest in pure mathematics that this 
action by Cicero was probably the greatest single contribution of any 
Roman to the history of mathematics. 

This tombstone carving symbolized the two principal results of Book I 
(of two) of On the Sphere and Cylinder, to the effect that the surface area S 
and the volume V of a sphere of radius r are given by 


S=4nr? and V = ar’. (13) 


The connection with the circumscribed cylinder is that S is two-thirds of 
the total surface area of the cylinder (including its two ends), while V is 
two-thirds of the volume of the cylinder. Thus the ratio of the surface areas 
of the sphere and cylinder is the same as the ratio of their volumes! 

We saw in Chapter 1 that Euclid (or Eudoxus) proved that the volume of 
a sphere is proportional to the cube of its radius, V=ar°, but did not 
discover that a = 47/3. The relationship between V and the surface area S 
is suggested by a heuristic argument similar to the one mentioned earlier in 
this chapter in the discussion of the area and the circumference of a circle. 
We regard the sphere as approximately the union of indefinitely many 
pyramids with the center of the sphere as their common vertex, and with 
their bases forming a polyhedral surface with indefinitely many faces 
inscribed in the sphere, each of which almost coincides with a small piece 
of the sphere. Since the height of each of the pyramids will virtually equal 
the radius of the sphere, and the volume of a pyramid is one-third of the 
product of its height and its base, it seems evident that 


V = irs. (14) 
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Assuming formula (14), either of the formulas in (13) may be deduced 
from the other. Archimedes initially derived S = 4ar* from V = $2’, for he 
states in his treatise The Method (discussed later in this chapter) that 


From the theorem that a sphere is four times as great as the cone with a 
great circle of the sphere as base and with height equal to the radius of the 
sphere I conceived the notion that the surface of any sphere is four times 
as great as a great circle in it; for, judging from the fact that any circle is 
equal to a triangle with base equal to the circumference and height equal 
to the radius of the circle, I apprehended that, in like manner, any sphere 
iS equal to a cone with base equal to the surface of the sphere and height 
equal to the radius. 


The concept of area is inherently more complicated for curvilinear 
surfaces in space than it is for plane regions. Before he could proceed to 
compute surface areas, it was necessary for Archimedes first to limit the 
class of surfaces to be considered, and then to introduce axioms that serve 
(in modern terms) to define the concept of surface area. With striking 
insight he stated the following definitions and axioms at the beginning of 
On the Sphere and Cylinder I. 

Let C be a bounded plane curve (1.e., one having two endpoints) that lies 
on one side of the line Z through its endpoints. Then the curve C 1s called 
convex if, for any two points P and Q of C, the line segment from P to Q 1s 
wholly contained in the region that is bounded by the curve C and the 
segment of L joining the endpoints of C (Fig. 12). 

Similarly, let S be a surface bounded by a simple closed curve J in the 
plane M, such that S lies on one side of M, and denote by = the plane 
region in M that is bounded by J. Then the surface S is called convex if, 
for any two points P and Q of S, the line segment joining P and @ is 
wholly contained by the region (in space) that is bounded by SUZ 
(Fig. 13). 

For simple (unbounded) closed’ curves or surfaces the definition of 
convexity can be stated more simply. A simple closed curve (in a plane) or 
surface (in space) is convex if the region bounded by it wholly contains any 
line segment joining two points of it. 

Archimedes discusses curve length and surface area only for convex 
curves and surfaces. Realizing that these are new geometric magnitudes 


Convex i Not Convex L 
Figure 12 
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Figure 13 


Figure 14 
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Figure 15 


different from any studied by his predecessors, he introduces the following 
convexity axioms for their definition and computation. 


I. (For curves). If C and C’ are convex curves with the same endpoints, 
and C is contained in the region bounded by C’ and the line segment 
joining its endpoints, then the length /(C) of C is less than the length of C’, 


KC) < KC’). 


That is, if one convex curve is “included within” another (Fig. 14), then the 
one included is shorter. Also, a straight line segment is the shortest curve 
joining two given points. 


IJ. (For surfaces). If S and S’ are convex surfaces with the same 
boundary curve which bounds a region > in a plane, and S is contained in 
the region bounded by S’U &, then 


a(S) < a(S’). 


That is, if one convex surface is “included within” another, then the 
included surface has the lesser area (Fig. 15). Also, of all surfaces bounded 
by a given plane curve, the region in the plane has the least area. 
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To illustrate the application of the convexity axioms in the method of 
compression, consider a convex closed curve C with an inscribed polygon 
P and a circumscribed polygon Q (Fig. 16). Then the convexity axiom for 
curves implies that the typical line segment_AB is shorter than the curve 


segment AB, and that the curve segment is shorter than the portion 
EG UGF of the circumscribed polygon. Hence 
IP) <lC) <Q). (15) 


If we can find a sequence { P,,} of inscribed polygons and a sequence { Q, } 
of circumscribed polygons, such that 


tim ((P,) = lim 1(Q,) = L, 
where we can compute L, then (15) implies that 
KC) = L. 


In essence, the convexity axiom requires that we define I(C) to equal L, 
this being the only value for the length of C that is consistent with the 
convexity axiom. 


Archimedes begins his investigation of surface area with the proofs of 
the formulas 


A = 2arh (16) 


for the surface area (excluding the bases) of a cylinder with radius r and 
height Ah, and 


A = ars (17) 


for the surface area (excluding the base) of a cone with base radius r and 


“slant height” s= Vr? + h? . We shall discuss here the case of the cone, 
and leave the cylinder to an exercise for the reader. 
A heuristic derivation of (17) may be obtained by “unrolling” the cone 
onto a circular sector (Fig. 17) with radius s and central angle 
2ar 


@6=— -27r7 = calle radians. 
27S K) 
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The formula A = } 56 for the area of the circular sector then yields (17). 


Now let C denote the lateral surface of a right circular cone with vertex 
V whose base B 1s a circle of radius r. If P and Q are regular polygons, 
with P inscribed in B and Q circumscribed about B, then denote by VP 
and VQ the corresponding pyramids inscribed in and circumscribed about 
the cone C. We would like to conclude from the convexity axiom for 
surface area that the lateral surface areas a(VP) and a(VQ) of these 
inscribed and circumscribed pyramids are, respectively, less than and 
greater than the lateral surface area a(C) of the cone, 


a(VP) <a(C) <a(VOQ). (18) 


These inequalities do not follow immediately from the convexity axiom, 
because the three convex surfaces have different boundary curves, but are 
established by ingenious arguments in Propositions 9 and 10 of On the 
Sphere and Cylinder I. 

Archimedes then proves that a(C) = ars by an interesting ratio version 
of the method of compression. Let B’ be a circle with radius R= Vrs ; we 
want to show that 


a(C) = a(B’) = aR? = rs. 


Assuming first that a(C) >a(B’), choose polygons P’ and Q’ inscribed 


Figure 19 


in and circumscribed about B’ such that 


a(Q’) — a(C) 
a(P’) — a(B’) 
(Archimedes has established in Proposition 3 the rather obvious fact that, 
given a > 1, there exist inscribed and circumscribed polygons P and Q ina 


circle such that a(Q)/a(P)<a). Let P and @ be similar polygons in- 
scribed in and circumscribed about the base B of the cone. Then 


so it follows that a(VQ) = a(Q’). Hence 


a(VQ) _ a(Q')  a(C) 
a(P’)  a(P’) ~ a(B’) 


But this is a contradiction, because a(VQ) >a(C) by (18), while a(P’) < 
a(B’) since P’ is inscribed in B’. Therefore a(C) is not greater than a(B’). 


Figure 20 
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Next suppose that a(C)<a(B’), and choose regular polygons P’ and Q’ 
inscribed in and circumscribed about B’ such that 
a(Q’) _ a(B’) 
a(P’) ~ a(C)- 
Again let P and Q be similar polygons inscribed in and circumscribed 
about B. Then 


(19) 


This is true because 
a(P) _ r* 
a(VP) — st 
where r* is a perpendicular from the center of B to a side of P, while s* is 
a perpendicular from V to a side of P. But then 


rm’ _ rcosa Jr 
s* scosB Ss 


because it is evident that a > 8, socos a < cos B (Fig. 20). 
From (19) we see that a(VP)>a(P’), so 


a(Q') — a(B’) 
a(VP) ~~ a(C) 


But this is a contradiction because a(VP) <a(C) by (18), while a(Q’) > 
a(B") since Q’ is circumscribed about B’. Consequently we conclude that 


a(C) = a(B’) = ars 
as desired. Oo 


From (17) it is easy to derive the formula for the lateral surface area of a 
frustum of a cone, 
A = a(r,+1r))5, (20) 


where r, and r, are the radi of the bases and s is the slant height of the 
frustum (Fig. 21). 


Figure 21 
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EXERCISE 10. Derive formula (20) for the lateral surface area of a conical frustum. 


EXERCISE 11. Give a ngorous proof by the method of compression of the formula 
A =2arh for the lateral surface area of a cylinder, following in outline the above 
proof of the formula A=ars for the lateral surface area of a cone (replacing 
pyramids with prisms). 


The Sphere 


In order to investigate the surface area of a sphere S with diameter d=2r, 
Archimedes inscribes a regular polygon P with 2n sides in a great circle of 
S, and then rotates it about a diameter through a pair of opposite vertices 
of P. The surface 2’ of the resulting solid of revolution V’ then consists of 
two cones and n — 2 frusta of cones, each with slant height s’ equal to the 
length of a side of P. The radii a,,a,,...,a,_, of their bases are 
semi-chords of the great circle, drawn through the vertices of P perpendic- 
ular to the axis AC of revolution (Fig. 22). 

Applying the formulas for the surface of a cone and of a frustum of a 
cone (Eq. (20)) to compute the surface area of the inscribed surface =’, we 


Figure 22 
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obtain 
a(d’) = ma,s’ + w(a,+a,)s' + --- +2(a,_,+4,_,)s’ + 7a,_,5’ 
—1 
= 2as' >) a,. (21) 
i=l 
Now we divide the diameter AC into segments of lengths )D,, b,, bp, 
b,,...,5,_1,5,_, as indicated in Figure 22. Then from the obvious 
similar triangles we see that 
Op as OP pie ee 
b, b, b,-) s’ 
Since 2(b, +b,+--- +5,_,)=2r=d, it follows by addition of these 
ratios that 
2(a,;+a,+-+* +4,_,) _ BC 
d s’ 
SO 
n—- 1 


2s’ S| a, =d- BC = 4r’ cos 8, 
i=] 
where @ is the angle ACB (and, to avoid anachronism in describing 


Archimedes’ computation, cos # is simply an abbreviation for the ratio 
BC / AC). Consequently (21) becomes 


a(>’) = 4nr’ cos 0. (22) 


In particular, we see that the area of the inscribed surface =’ is less than 
4nr*. Also, it follows from the convexity axiom that a(=’)<a(S). 

Next we want to show that the surface area of a similar but circum- 
scribed surface is greater than 4zr?. Let Q be a regular polygon with sides 
of length s”, similar to P but circumscribed about the great circle with 
diameter AC (Fig. 23). Upon rotating it about this diameter, we obtain a 
solid of revolution V” with surface >”. 

Since 2’’ may be regarded as a surface similar to >’, but inscribed in a 
slightly larger sphere with diameter (see Fig. 23) 


d’ = A’C’ = 2r sec 9, 
the above calculation immediately yields 
a(X") = nd’ - B'C' = 4nr’ sec 8, (23) 


since B’C’ =2r by similar triangles (Fig. 23). Because sec 0 = AC / BC > 1, 
we see that the area of the circumscribed surface =" is greater than 4ar?. 
Also, a(X”) >a(S) by the convexity axiom. 

Equations (22) and (23) together give 


4nr* cos 9 < a(S) < 4ar? sec 8, 
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Figure 23 


from which the limit as n—>00, 8-0 obviously yields a(S)=4ar. Of 
course Archimedes concludes the proof with the usual double reductio ad 
absurdum argument. 

From the triangles ADO and A’B’C’ in Fig. 23 we see that 


Le 


: 5) 5’ 5” 
sin 6 eer keg) and land = —, 
sO 
sec 9 = ane = = 
sin @ ‘) 
Consequently (22) and (23) imply that 
az") ag (=) 24 
a(=) = sec’ ae (24) 


the ratio of the squares of the sides of the polygons Q and P. We are now 
ready for the proof that a(S) =4zr’ by the ratio form of the compression 
method. 

Assuming that a(S)>4r’, choose the inscribed and circumscribed 
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polygons P and Q such that 


Then equation (24) gives 


a)» (5) <a 


But this is impossible because a(=”) >a(S) while a(>’) < 4ar’. 
Assuming that a(S)< 4zr’, choose P and Q such that 


oc Arr? 
s’ a(S 


wee” 


Then equation (24) gives 


a(=") ( s” ) Anr? 


a(=)\ s" a(S)’ 
But this is impossible because a(=")>4ar? while a(X’)<a(S). It finally 
follows that a(S) =4ar? as desired. 0 


The same geometric construction yields a proof that the volume of the 
ball V bounded by the sphere S is v(V) =4ar°/3. Archimedes proves that 
the volume of the inscribed solid V’ is equal to that of a cone whose base 
has area a(X’) and whose height is equal to the length p of a perpendicular 
from the center O to a side of the polygon P. Hence 


v(V’) = tpa(’) < far’. (25) 


Similarly, the volume of the circumscribed solid V” is equal to that of a 
cone with height r whose base has area a(X”). Hence 


vo(V") = sra(=") > Sar. (26) 
The ratio of these volumes is 


sr «me = (=) an 


S 


because r/p =sec 0=5”/s’. Finally, equations (25), (26), and (27) provide 
the ingredients for a compression proof of v(V)=4ar?/3 that is virtually 
identical to the preceding proof that a(S) =4ar’. 


EXERCISE 12. Supply the details for this proof that o(V) =47r>/3 by the ratio form 
of the method of compression. 
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Archimedes also introduced the refinement of the above construction 
that is necessary to compute the volume and (curved) surface area of a 
segment = of a sphere of radius r that is cut off by a plane at a distance 
a<r from the center. In terms of the height h = r— a and the base radius 


p= Vr? — a’ of the segment, these are 


a(=) = a(p* +h’), v(x) = 5 707h| 


The Method of Compression 


Archimedes’ method of compression, for proving that a geometric magni- 
tude S (length, area, or volume) is equal to a given magnitude C, may be 
described in quite general terms as follows. On the basis of the geometry of 
the figure whose length, area, or volume is sought, two sequences { L,,} and 
{U,,} are constructed such that 


L,<S<U, and L,<C<U, foralln. (28) 
In the “difference form” of the method it is proved that, given « >0, 
U,-L,<e : (29) 


for n sufficiently large. In the “ratio form” of the method it is proved that, 
given a> 1, 


U,, 
TZ. <a (30) 


n 


for n sufficiently large. In either case a double reductio ad absurdum proof 
finally establishes that S = C as desired. 


EXERCISE 13. (a) Use (28) and (29) to prove that S=C. (b) Use (28) and (30) to 
prove that S=C. 


The Archimedean Spiral 


Greek mathematicians suffered from a paucity of curves available to serve 
as objects of their study. Because their algebra was geometric and rhetori- 
cal rather than numerical and symbolic, the introduction of new curves by 
means of equations (as in analytic geometry) was not feasible. In addition, 
Greek geometry was essentially static rather than dynamic in character. 
Consequently they could, for the most part, define curves only in terms of 
simple locus conditions (e.g., the circle as the locus of points equidistant 
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from a given fixed point) or as intersections of given surfaces fixed in 
position (e.g., a conic section as the intersection of a plane and a cone). 

The static character of Greek geometry 1s a reflection of the very limited 
role of motion and variability concepts in Greek science generally. Only 
the cases of uniform motion—either rectilinear or circular—were studied 
in any detail. Other motions (such as those of the planets in Greek 
astronomical models) could only be analyzed in terms of uniform linear 
and circular motions. 

It was in these terms that Archimedes defined his famous spiral—as the 
composition of a uniform linear motion and a uniform circular motion. 


If a straight line drawn in a plane revolve at a uniform rate about one 
extremity which remains fixed and return to the position from which it 
started, and if, at the same time as the line revolves, a point move at a 
uniform rate along the straight line beginning from the extremity which 
remains fixed, the point will describe a spiral in the plane (Definition | of 
On Spirals). 


To describe this curve in modern polar coordinates, let w (in radians) be 
the constant angular speed of rotation of the line, and v the constant speed 
with which the point moves along the line, starting at the origin. Then the 
polar coordinates of the moving point at time t are r= vt and 8=w!t, so the 
polar coordinates equation of the spiral is 


r= ab (31) 


where a= v/w. 

The first twenty propositions of the treatise On Spirals are devoted 
mainly to the determination of the tangent line to the spiral at a given 
point of it. Here, as elsewhere in Greek geometry, a static conception of a 
tangent line to a curve is employed—it is a straight line that “touches” the 
curve at a single point without crossing it. Although no dynamic considera- 
tions appear in the finished exposition, it has been conjectured that 
Archimedes discovered the tangent line to the spiral by means of a 
parallelogram of velocities determined by the two component motions 
generating the spiral (see the Appendix to Heath [6]). If so, this would be a 
rare (if not unique) instance of differential calculus methods in antiquity. 

At any rate, he shows that the tangent line at the point P to the spiral 
OPS intersects the perpendicular OQ to OP in a point T such that the line 
segment OT is equal in length to the circular arc PR intercepted between 
the polar axis and the radius vector OP (Fig. 24). The proof is a double 
reductio ad absurdum argument showing that the assumptions OT >PR 
and OT < PR lead to contradictions. 


EXERCISE 14. Use a parallelogram of velocities argument to verify Archimedes’ 
construction of the tangent line to the spiral, for the case when P is the point 
(0, av /2) on the y-axis resulting from one quarter-turn of the radius vector. At this 
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Figure 24 


instant P has a vertical velocity component v and a horizontal velocity component 
wan /2 (why?). 


EXERCISE 15. Suppose the spiral and its tangent line at the point P of the preceding 
exercise are given. Then explain how on this basis to “square the circle’—to 
construct by ruler and compass methods a square whose area is equal to that of the 
circle with radius OP. Recall the first proposition of On the Measurement of the 
Circle. 


EXERCISE 16. Show that the Archimedean spiral can be used to trisect an arbitrary 
angle, given that a line segment can easily be trisected (how?) See Figure 25. 


The final eight propositions of On Spirals are devoted to area computa- 
tions. For example, Archimedes proves that the area of the region S, 
bounded by one turn of the spiral and the line segment joining its initial 
and final points, is one-third that of the circle C centered at the initial 
point and passing through the final point (Fig. 26). That is, 


a(S) = 1a(2aay. (32) 


The proof of (32) makes use of the familiar (then as now) formulas for 


Spiral 


Figure 25 
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Figure 26 


the sum of the terms of an arithmetic progression, and of their squares: 


l+2+--- +n=>(nt1) (33) 
and 
?+2?+--- +n? = &(nt1)(2n+ 1). (34) 
Writing (34) in the form 
7+27+.--- ent Rye 
it is clear that 
PHPe et (n-I cD cBe eee tnt (35) 


which is what Archimedes actually needed. 


EXERCISE 17. Prove by mathematical induction that formulas (33) and (34) hold for 
all n. That is, note that each of these formulas holds for n= 1, and show that its 
truth for n =k implies its truth for n= k +1. Why does this imply that it holds for 
all n? 


Figure 27 
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Archimedes’ geometric construction for the proof of (32) is very similar 
to that which would be employed to set up a modern definite integral in 
polar coordinates for this area computation—see Figure 27. We divide the 
circle into n equal sectors as shown, with their bounding radii intersecting 
the spiral in the points O, A,, A,,...,A,. If we write OA, = b, then 


OA, = 5b, OA, = 25, i eeion OA,, = nb. 
Consequently we see that the spiral region S contains a region P consisting 
of circular sectors with radii 

0, b,...,(n—1)8, 
and in turn is contained in a region Q consisting of circular sectors with 
radii 

b, 2b,..., nb. 


Therefore a(Q) — a(P) is equal to the area of a single sector of the circle C, 
and this can be made as small as we please by choosing n sufficiently large. 
Thus we have the necessary ingredients for a proof by compression. 

If a(S) <5a(C), we choose n sufficiently large that 


a(Q) — a(P) < 3a(C) — a(S), 
so it follows that 
a(Q) < 3a(C). 


But since the ratio of the areas of similar circular sectors is equal to the 
ratio of the squares of their radii, we find that 


a(Q) _ b?+(2b)’+--- +(nby 
a(C) n(nb)y 
a A ee 
n? 3 


by (35). This contradiction shows that a(S) is not less than {a(C). 
Assuming that a(S) > +a(C), we choose n sufficiently large that 


a(Q) - a(P) < a(S) —4a(C), 
so it follows that 
a(P) > a(C). 
But then we find that 


a(P) _ b?+(2by +--+ +[(n—-1)b] 
a(C) n(nb)? 
_ P+2+--- +(n—-1P a 


n? 3 
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Figure 28 


by (35). This contradiction shows that a(S) is not greater than ja(C), so 
we conclude that 


a(S) = 4a(C) = }2(2za)’ O 
as desired. 


Archimedes went on to calculate the area of a spiral sector S (Fig. 28). If 
the radii to the initial and final points on the spiral are r, and r,, and the 
central angle is @ (in radians), then 


0 
a(S) =>) n2+3(2-ny |. (36) 


Although (36) may not be of great importance in itself, its proof seems 
worthy of inclusion as a demonstration of the computational mastery that 
Archimedes exhibited, despite the fact that he had to express his calcula- 
tions entirely in verbal and geometric language. 


The proof of (36) requires the formula for the sum of the squares of the 
terms of an arbitrary finite arithmetic progression 
a, = 4, a,=artb, a; =a+t+2b, deat a, =at+(n—1)b 
with m terms, initial term a, and common difference b. We find that 
a>+(at+by+--- +[a+(n—1)b]’ 
na’ + 2a[b+2b+ +--+ +(n—1)d] 
+[b?+(2b)/ +--+ +(n—1)767] 
na’ + 2ab[1+2+--- +(n—1)] 
+b7[ 1? +2?+ --- +(n-1)'] 


= na? + abn(n—1) + Pn —1)(n)\(2n-1) 


] 2n*—n 
= naa, + (4, — 4) —— (37) 
because (7 — 1)b =a, — a, and a* + ab(n— 1)=a,a,. 
Now what we actually need are the inequalities 
at+---+a?_,<(n- 1)| a,a, +3(a,— a,)’ | 


<az+--- +a? (38) 
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Figure 29 


generalizing inequalities (35). In order to prove the right-hand inequality in 
(38), we need to see that 

(n—1) aya, +3 (4, — 4) < na,a, += z(4, —a,) a ae ai, 
or 
2 2n? —n 


6(n—1)° 


(n—1)a,a, + a? + x(n ~1)(a, — a,y < na,a, + (a, —a,) 


But it is obvious that 
(n—1)a,a, + a? < na,a,, 
and 
nol _ 2nton 
3 6(n — 1) 
is verified by cross-multiplication. The left-hand inequality in (38) can be 
established similarly. oO 


Now we are ready for the proof of formula (36) for the area of the spiral 
sector S. Let C be a circle with radius r, and center O, and let = be the 
sector of C with central angle 8 containing the spiral sector S. Let C’ be a 
circle of radius r such that 

r= rry+3(m—1), 
and let &’ be a sector of C’ with central angle 8. We want to prove that 
a(S) = a(’). 


We subdivide = into n—1 equal sectors, each with central angle 
8/(n—1), and obtain figures P and Q inscribed in and circumscribed 
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about S, as in the previous proof. If 


ry—r 
a=r, and b=— - 
then P consists of sectors with radii 
a, = 4a, a,=artb, ptak a,-) = a+(n-—2)b, 
while Q consists of sectors with radii 
a,=at+b, a,=at+2b rer a,=at(n—l1)b=nr,. 


Assuming that a(S) <a(’), choose n sufficiently large that 
a(Q) < a(>’). 


But then 
a a? er a? aa, +3 a,—a,) 
os 7 . _ | “ | » (by G8) 
a(Q) _ a(%’) 
a() a(z) ’ 


which implies that a(Q) >a(2’). This contradiction shows that a(S) is not 
less than a(%’). 
Assuming that a(S) >a(’), choose n sufficiently large that 
a(P) > a(%’). 
But then 
a(P) _ aj+--- +a?_, 
a() (n—1)a? 
a)a,, +5(4, ~ a)” 
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n 


a(’) 
a() 


which implies that a(P) <a(2’). This contradiction shows that a(S) is not 
greater than a(>’), so we conclude that 


: a) 
a($) = (2) = 5] nt g(r) 
as desired. CJ 


If a and £ denote the initial and final angles of the sector S, substitution 
of 0= B — a, r,; = aa, r,= af into (36) yields 


a(S) = £(p3— a2) 
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Figure 30 


Since integration in polar coordinates gives 
bP a 2 FP 
a(S)= 5 fr d = oe dO, 
formula (36) is thus equivalent to the fact that 
[°9? ao = 7 (B°— 0°). 


EXERCISE 18. Consider in this problem multiple turns of the spiral r=a@. Let A, 
denote the area bounded by the nth turn (for 27(n — 1) <@ < 2an) and the portion 
of the polar axis joining its endpoints. For each n > 2 let R, = A,, — A,,_ , denote the 
area of the ring between the (n — 1)st and the nth turns (Fig. 30). 

(a) Apply the formula (36) to calculate 


A, = Lar(2na)y and A,= 7 a(4na)’. 
Conclude that R,=64A). 
(b) For n > 2 derive Archimedes’ formula 


nR,, 
Rn+1 = pre ie nR>. 


Solids of Revolution 


Archimedes’ use of the method of compression approached most closely to 
the construction underlying the modern definite integral in his treatise On 
Conoids and Spheroids. A “conoid” is what we would call a paraboloid or 
hyperboloid of revolution, and a “spheroid” is an ellipsoid of revolution. 
He showed that the volume of a paraboloid of revolution P inscribed in a 
cylinder C with radius R and height A is 


v(P) = $7R7H, (39) 
one-half of the volume of the circumscribed cylinder (Fig. 31). 
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Figure 31 


To prove (39) we cut the paraboloid and the cylinder into n slices of 
equal thickness h= H/n by means of equidistant planes perpendicular to 
the axis OA of the cylinder. Let A,, A,,...,A,=A be the points of 
intersection of these planes with the axis OA, and B,,..., B,=B their 
intersections with one side OB of the parabola that generates the 
paraboloid by revolution about OA (Fig. 32). 

Now consider the solid of revolution J inscribed in the paraboloid P that 
consists of n — 1 thin cylinders with radu 


yt: eee, ey; ae 


n—1 
and the circumscribed solid of revolution J that consists of n thin cylinders 
with radii 

A,B,,...,A,8,. 


Then 
rene “( S 7B?) end = “( > ABi)h (40) 


i=] i=] 


so v(J)— v(1) = 7A, B,*h can be made as small as we please by choosing n 
sufficiently large, because A,B, = R andh=H/n. 


Figure 32 
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Notice that 


while 


= = > -,7 42 
oC) nm Rh a = 
using the inequalities 
a2 
l+:--: t(n-I)<y><lit--: +n 
that follow immediately from the identity 
l+--: +n =F (n+l), 


If it were true that vo(P)>}7R*H=jv(C), we could choose n 
sufficiently large that 


v(1) > 50(C) 


(why?), which contradicts (41). If it were true that o(P) <37R7H =30(C), 
we could choose n sufficiently large that 


o(J) < 30(C), 
which contradicts (42). We therefore conclude that 
v(P) =} R7H = uo(C) 
as desired. C 


In order to interpret this result as an integral, let us turn the parabola on 
its side (Fig. 33), so it takes the form 
2 
y= = x: O<x <a. 
If we subdivide the interval [0, H’] into n equal subintervals, 


O=xX9< xX, < +++ <x, = A, 
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Figure 33 


and write y?=(R*/H)x,, then we see that 


n—1 
v(I) = “( Di 


and 
aR2x;h 
H 


2) = af > 2h -> 


i=] i=] 
are simply the lower and upper Riemann sums for the integral 
he aR*x dx 
0 ; 
so Archimedes’ analysis shows that 
H Rx I cay H Less 
J H 73 R*H or [oxdx = 5H? 
We have seen that the volume of a sphere is two-thirds that of the 
circumscribed cylinder. In On Conoids and Spheroids Archimedes shows 
that this relation generalizes to the case of an ellipsoid of revolution 


inscribed in a cylinder. That is, if the ellipse (Fig. 34) 


x? eer 


a* pb? 
y 


Figure 34 
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Figure 35 


with semi-axes a and b is revolved around the y-axis, then the volume of 
‘the ellipsoid E we obtain is 


v(E) = $2a’b. (43) 


Archimedes’ proof of (43) is based on precisely the same construction as 
in the case of the paraboloid, applied to the upper half E’ of the ellipsoid 
E, and with the same notation, except that R and H are replaced by a and 
b respectively. The volumes of the inscribed and circumscribed solids are 
still given by formulas (40), 


v(J) = “( > 1B?) and v(/) = “( 


i=] 


n 


A,B) 
p=1 


i= 


The difference here is that 
A,B? x? _ yi 


aa b? 
i (n— iy h? 
b2 
b?—(n— ish? 
b? 
n?—(n—iy 
2 


because b = nh. It follows that 


@e Dra (aes +(n—1)’) 


174+27+--+ +n? 
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because we know from (35) that 

+--+ 4(n-ly <in< P +--+ +n? 
Similarly, we see that 


a| > AiB?)h 
v(J) _ (= 
v(C) nna’h 


i=] n 
ies | See +(n—1)’) 
2 
+--+ +(n—-1) 
n 
v(J) _ 2 
v(C)~ 3° 


The facts that 
v(1) < v(E’) < v(/), v(1) < 0(C) < v(/) 


and that v(J)—wv(/) can be made arbitrarily small (by choosing n 
sufficiently large) now imply in the usual way that 


v(E’) = 20(C) 


as desired, C being a cylinder with radius a and height b. LJ 

Archimedes applied this “slicing construction” to calculate the volumes 
of segments of paraboloids, ellipsoids, and hyperboloids of revolution cut 
off by planes not necessarily perpendicular to the axes of these figures. In 
these investigations he employed a common general procedure for the 
solution of several different but similar volume problems. From this point 
it was a relatively short logical step (albeit one that was not taken for 
almost nineteen centuries) to the formulation of a general concept of 
integration. 


EXERCISE 19. Apply a slicing construction like that used in On Conoids and 
Spheroids to prove the formulas for the volume of a cone or a pyramid. 


EXERCISE 20. Use a slicing construction to prove that the volume of an elliptical 
cone, with height 4 and base the ellipse x?/a* + y?/b? =1 is V=4aabh. Show first 
that the area A, of the horizontal cross-section at a height z above the base is 
A, = nab(h — zy"/h?. 
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EXERCISE 2]. Use a slicing construction to prove that the volume of the ellipsoid 


x2 y? z 
a b Cc 


is V = $mabc. Show first that the area A, of the horizontal cross-section at height z 


above the xy-plane is 
2 
A, = rab -5). 
c 


The Method of Discovery 


The works of Archimedes only barely survived the so-called Dark Ages 
from 500 to 1000 a.p. Of the treatises mentioned previously in this chapter, 
only two-—Measurement of a Circle and On the Sphere and Cylinder—seem 
to have been generally known at the time of the Archimedean commenta- 
tor Eutocius in the sixth century. Almost all modern translations of 
Archimedes’ works stem from a single Greek manuscript that was copied 
from an earlier original at Constantinople in the ninth or tenth century, 
was translated into Latin in the thirteenth century, and eventually disap- 
peared without a trace in the sixteenth century. The main exception 1s a 
treatise entitled The Method that was rediscovered virtually by accident on 
a palimpsest parchment in Constantinople in 1906 after having been lost 
since the early centuries of our era. Archimedes’ work had been (for- 
tunately) imperfectly erased in about the thirteenth century and replaced 
with liturgical writings. In this now restored treatise he had detailed his 
method of discovery, rather than having deliberately concealed it as Wallis 
and others had speculated. 

After stating in the preface to The Method a couple of new theorems to 
be discussed, he adds his intent to “‘explain in detail in the same book the 
peculiarity of a certain method,” to which he attributes the discovery on 
heuristic grounds of many of his results, prior to providing them with 
rigorous proofs (by the method of exhaustion or compression). 


For certain things first became clear to me by a mechanical method, 
although they had to be demonstrated by geometry afterwards because 
their investigation by the said method did not furnish an actual demon- 
stration. But it is of course easier, when we have previously acquired, by 
the method, some knowledge of the questions, to supply the proof than it 
is to find it without any previous knowledge... I am persuaded that it 
will be of no little service to mathematics; for I apprehend that some, 
either of my contemporaries or of my successors, will, by means of the 
method when once established, be able to discover other theorems in 
addition, which have not yet occurred to me (preface to The Method [5)). 
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d, d, 
Figure 36 
Archimedes’ “mechanical method” is based on the law of the lever, 
according to which a finite system of point masses m),..., m, at distances 
d,,...,d, from the fulcrum, on one side of a (weightless) lever, balances 
another system mj,...,m, on the other side at distances dj,..., d) 
(Fig. 36) provided that 
p q 
2 m,d; = 2 m;d;. (44) 
i= j= 


Archimedes had written an earlier treatise On the Equilibrium of Planes 
devoted to geometric applications and generalizations of the law of the 
lever. In it he proved that the centroid of a triangle (the point about which 
it balances) is the common point of intersection of the medians from the 
vertices to the midpoints of the opposite sides. As is well known from 
elementary geometry, this point lies two-thirds of the way from each vertex 
to the midpoint of the opposite side. 

In its simplest form, the mechanical method for the investigation of 
areas and volumes can be described as follows. Suppose that R and S are 
two convex regions lying along the same interval of a horizontal axis L 
(Fig. 37). Given the area a(S) and the centroid c, of S, we inquire as to the 
area a(R) of R. 

We regard the two regions as plane laminas of unit density, each 
consisting of an indefinitely large number of elements—line segments or 
strips of infinitesimal width—perpendicular to L, and think of L as a lever 
with fulcrum O. Suppose there is a constant k such that, for each vertical 
line at a distance x from O intersecting the regions R and S in line 
segments with lengths / and /’ respectively, we can show that 


k-l=x-l’ (45) 


Figure 37 
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(1, 1) 


rast f 
(—1, 0) \ 


Figure 38 


Then the /aw of the lever implies that the segment /, if placed at the point P 
at a distance k from O, balances the segment 7’ where it is. It seems to 
follow that, if the region R is placed with its centroid at P, then it will 
balance the region S in place where it is, so that 


a(R) -k = a(S) - Xz, (46) 


where x, is the distance from O to the centroid c, of S (assuming that each 
region acts as a point mass placed at its centroid). Since k, a(S), and x, 
are assumed known, we can solve (46) for a(R). Conversely, if a(R), a(S), 
and &k were known in advance, we could solve (46) to discover the centroid 
of S. 

As a first example, we take R as the region bounded by the parabola 
y =x’, the x-axis, and the vertical line x = 1 (Fig. 38). Let S be the triangle 
with vertices (0, 0), (1, 3), (1, — 3), whose area is and whose centroid is 
the point (3, 0). Then /= x? and /’= x, so we can take k=1 in (45), ice. 
1 - x*=x - x. Then (46) gives 


a(R) = a(S) - Xs =4° 2=}. 


The computation is similar (although not identical) to Archimedes’ 
mechanical investigation of the area of a segment of a parabola. 

As he remarks, “the fact here stated is not actually demonstrated by the 
argument used; but that argument has given a sort of indication that the 
conclusion is true.” The reason the argument is not rigorous (as it stands) 
is that a plane region does not consist of a finite collection of line 
segments, whereas we have applied the law of the lever as stated for finite 
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N M 
Figure 39 


collections of masses. However, the needed fact that (45) implies (46), 
which Archimedes did not prove, is easily verified using integrals. For if 
we write /= w,(x) and /’ = w,(x) for the widths of R and S above x, then 


a(R) = f weds. as f gs Gerd is aire f ey 


so it is clear that (46) does indeed follow from (45). In these terms, the 
effect of “the method” is to express a desired integral (that for a(R)) in 
terms of another integral (that for a(S)) which is already known. 

If R and S are solid regions, then the widths in the above discussion are 
replaced by the cross-sectional areas of R and S in the plane perpendicular 
to L at x. As an example of this case we give a mechanical derivation of 
Archimedes’ favorite result—the formula for the volume of a sphere. We 
will use his geometric construction (for Proposition 2 of The Method), but 
with Cartesian coordinate computations instead of his arguments in terms 
of similar triangles. 

Starting with the circle x7 + y? =r? intersecting the positive x-axis at the 
point P=(r, 0), let. KLMN be a rectangle centered at the origin O with 
base d=2r and height 2d, and consider also the triangle KNP (Fig. 39). By 
revolving these three figures around the x-axis, we generate a sphere S, a 
cone C, and a cylinder Z. We consider these three solids as being made up 
of circular disks perpendicular to the x-axis. For example, the plane 


perpendicular to the x-axis at the point A =(x, 0) intersects the sphere S in 


a circle S. with radius AC = y = Vr? — x’, the cone C in a circle C, with 


72 Archimedes 


radius AB=r— x, and the cylinder Z in a circle Z, with radius AD =d. 
Now 


d| a(S,) + a(C,)] = md| y*+(r— x)*] 
= md| (7? — x?) + (7? —2rx + x?) 
= gd(2r? —2rx) 
= nd7(r—x) 
d| a(S,) + a(C,)] = (r— x)a(Z,). (47) 


This implies that, if the circles S, and C, are placed at the point 
Q =(3r, 0) a distance d to the right of P, then together they will balance 
the circle Z,, where it is, considering the x-axis as a lever with fulcrum P. It 
follows that, if the sphere S and the cone C are placed with their centroids 
at Q, then together they will balance the cylinder Z where it is. Since by 
symmetry the centroid of Z is at the origin O, the law of the lever gives 


2r[ vo(S)+0(C)] = ro(Z). (48) 


Substituting the known volumes vo(C)=d?/3 and v(Z)= 2d? into (47), 
we obtain 
o(S) = ind?* = Sar’. 


Archimedes indicates that this was his original derivation of the volume of 
a sphere, from which he inferred (as described previously) the formula for 
the surface area of the sphere. 


EXERCISE 22. Obtain from this same construction the formula 
= ] 2 3r = h 
ae 37P n( 5 | 
for the volume of the smaller spherical segment cut off by the plane x = a (where 


0<a<r), having base radius p= Vr? — a? and height h=r-—a. Use the portions 
of the cone and cylinder cut off by this same plane. 


EXERCISE 23. Let P be the segment of a paraboloid obtained by revolving the 
parabola y* = x, 0<x < 1, about the x-axis. Use Archimedes’ mechanical method 


y4 Zz 


Figure 40 
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to deduce that the volume of P is one-half that of the circumscribed cylinder Z 
(Fig. 40). 


EXERCISE 24. Balance the paraboloid (where it is) against the inscribed cone C 
(concentrated at an appropriate point) to show that the centroid of the paraboloid 
is two-thirds of the way from its vertex to its base. 


After applying his mechanical technique to compute the volumes and 
centroids of segments of ellipsoids, paraboloids, and hyperboloids of 
revolution, Archimedes concludes The Method with computations of the 
volumes of two special solids that are standard examples in modern 
calculus textbooks. In Proposition 14 he proves that the volume of the 
wedge W, cut from a cylinder of diameter d and height h = d/2 by a plane 
through a diameter of one base and a point of the circumference of the 
other (Fig. 41), iso(W)=d°?/12. In Proposition 15 he shows that: the 
volume of the region S* common to two cylinders with diameter d and 
perpendicular intersecting axes is v(S*)=2d°/3. Figure 42 shows the first 


octant of the region S*. 


Figure 41 


Cross-section 
at x, square 


of edge ,/r? — x? 


AX 


2.8, 


oe 
aa 


*eweeecaan ean 
[MPU EEEMEE ',. 


Zz x? t+2z2=Pr- 


Figure 42 
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The calculation of the volume of S* employs the same construction as 
indicated in Figure 39, except with circular cross-sections replaced by 
Square cross-sections. The cross-section of S* perpendicular to the x-axis 
at A is a square S* with edge 2AC=2Vr? — x? (see Figures 39 and 42). 
Let C* be the pyramid with vertex P and square base with edge KN = 2d. 
Then the cross-section of C* perpendicular to the x-axis at A is a square 
C¥ with edge 2AB=2(r— x). Let Z* be the rectangular solid with height 
KL=d and square base with edge KN=2d. Its cross-section Z* per- 
pendicular to the x-axis at A is then a square with area 4d’. 

Upon adding asterisks and replacing the constant 7 with 4, the deriva- 
tion of equation (47) now gives 


d{ a(S*) + a(C*)] = (r— x)a(Z?). (47*) 


In the same way that equation (47) implies equation (48), 1t follows from 
(47*) that 


2r[ o(S*) + o(C*)] = ro(Z*). (48*) 


Substituting v(Z*)=4d> and v(C*)=+0(Z*) = Fd? into (48*), we finally 
obtain 


v(S*) = ¢d°. (49) 


Since it is evident that S* is the union of eight copies of the wedge W of 
Figure 41, it follows from (49) that 


vo(W) = +4d?. (50) 


However, Archimedes gave a separate mechanical derivation of (50), as 
well as rigorous compression proofs of both (49) and (50). 

In its treatment of plane areas in terms of line elements and volumes in 
terms of area elements, Archimedes’ mechanical method was a precursor 
to the “indivisibles” techniques that flourished in the early seventeenth 
century, and which led directly to the rapid development of the calculus. 


Archimedes and Calculus? 


We have seen in this chapter that Archimedes solved many of the prob- 
lems that are staples of modern calculus courses, and in particular that his 
solutions can often be interpreted as computations of definite integrals of 
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the form 


| “(ax + bx?) dx. 
Cc 


On this basis various authors have credited Archimedes with the original 
discovery of the calculus. 

While it is true that Archimedes’ work ultimately (in the seventeenth 
century) gave birth to the calculus, three indispensable ingredients of the 
calculus are missing in his methods: 

(1) The explicit introduction of limit concepts. Archimedes, at least in 
his formal proofs if not in his informal analyses, shared the Greek “horror 
of the infinite.” The Greek concept of rigor demanded the cumbersome 
double reductio ad absurdum argument rather than a simple passage to the 
limit. 

(2) A general computational algorithm for the calculation of areas and 
volumes. A distinctive feature of the calculus is the formulation of general 
procedures for the exploitation of analogies between different but similar 
problems to lessen the burden of duplication in their solutions. By con- 
trast, Archimedes (with very few exceptions) started from scratch in each 
computation, basing the solution of each problem on a construction 
determined by the special geometric features of that particular problem, 
and without taking advantage of previous solutions of similar problems. 
The reliance upon geometric algebra without any simplifying symbolic 
notation was a substantial impediment to the identification and codifica- 
tion of computational features common to different problems. 

(3) A recognition of the inverse relationship between area and tangent 
problems. The Greek view of tangent lines as merely “touching” lines was 
inadequate to provide any hint of this relationship or of “rate of change” 
interpretations. 

Nevertheless, the investigations of Archimedes continue to serve as 
exemplars of originality and precision. His progress with the mathematical 
tools available to him will always be one of the great landmarks in the 
history of mathematics. 
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Introduction 


The classical era of Greek mathematical development stretched over a 
period of approximately ten centuries, from about 600 B.c. to 400 A.D. 
However, it reached an early climax in the third century B.c. with the work 
of Archimedes and that of his younger contemporary Apollonius, who 
elaborated a comprehensive theory of the conic sections. Coincident with 
the establishment of Roman power in the Mediterranean area during the 
second century B.c., Hellenistic culture in general, and Greek theoretical 
mathematics in particular, began a period of decline that produced no new 
contributions comparable to those of Eudoxus, Euclid, Archimedes, and 
Apollonius. 

With the collapse of the Western Roman Empire in the fifth century 
A.D., Western Europe entered a long dark age during which the scientific 
accomplishments of the past were only dimly remembered, when at all. 
For several centuries after the fall of Rome, the Greek legacy was confined 
to the beleagured Byzantine or Eastern Roman Empire, where it was 
preserved but hardly flourished. In 529 a.p. the emperor Justinian closed 
the Greek schools of “pagan” philosophy at Athens, including Plato’s 
Academy which had survived for nine centuries. 

The principal center of surviving Greek learning at Alexandria fell in 
641 A.D. to the Moslems, who during the seventh century rapidly con- 
quered many of the territories immediately surrounding the Medi- 
terranean, and established there a stable culture that prospered for at least 
four centuries. The Moslems eagerly absorbed the available repository of 
Greek science and mathematics, together with Indian and Babylonian 
elements, and during the ninth and tenth centuries the works of Euclid, 
Archimedes, Apollonius, and Ptolemy were translated from Greek into 
Arabic. 
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The boundaries of Arab jurisdiction in the west were pushed back in the 
eleventh century. With the fall of Toledo (Spain) in 1085 and Sicily in 
1091, the Greek classics in Arabic were again accessible to Christian 
Western Europe, and were translated into Latin during the twelfth and 
thirteenth centuries. The Greek mathematical works thus reacquired, and 
the medieval scholastic speculations on motion, variability, and the in- 
finite, together with the symbolic algebra and analytic geometry of the late 
Renaissance, formed the rich amalgam that fueled the seventeenth century 
explosion of infinitesimal mathematics. In this chapter we outline briefly 
the significant features of the decline of Greek mathematics, the absorp- 
tion and transmission of the mathematics of antiquity by the Islamic 
culture, and the eventual rebirth of mathematical progress in Western 
Europe. 


The Decline of Greek Mathematics 


After the golden age of the third and fourth centuries B.c., theoretical 
progress ceased and Greek mathematics turned toward applications of a 
sort that failed to stimulate further progress in mathematics itself. The 
most significant work of the next four centuries was the mathematical 
astronomy and associated applied trigonometry of Hipparchus (second 
century B.C.) and Ptolemy (second century A.D.). Scholarly commentaries 
on the Greek geometrical masterworks were written by Pappus, Proclus, 
and Eutocius in the fourth, fifth, and sixth centuries A.D., but the earlier 
level of originality in geometrical analysis was never regained. Although 
Archimedes had seemingly paved the way for a development of infinitesi- 
mal techniques, his work had to wait eighteen centuries for its continua- 
tion. 

Changing political and social conditions probably played some role in 
this decline. Greek science, despite its brilliant achievements, was apparen- 
tly a fragile enterprise, centering on a relatively small number of profes- 
sional workers concentrated in a few centers (such as Athens and 
Alexandria), and dependent on both favorable intellectual conditions and 
royal subsidies for its continued success. Wars and the end of Hellenistic 
prosperity under Roman domination terminated these favorable condi- 
tions. Archimedes was killed in the sack of Syracuse, and a large part of 
the famous library at Alexandria was burned during the Roman siege of 
that city. The Romans were an intensely practical people who undertook 
great construction projects (bridges, highways, viaducts, etc.) on the basis 
of rule of thumb procedures, but had no interest in and did not support 
abstract and theoretical studies. 

However, these vagaries of the external ancient world were not by 
themselves responsible for the failure of Greek analysis to advance materi- 
ally beyond Archimedes. There were also internal mathematical factors 
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that suffice to explain this failure. These impeding factors centered on the 
rigid separation in Greek mathematics between geometry and arithmetic 
(or algebra), and a one-sided emphasis on the former. Their analysis dealt 
solely with geometrical magnitudes—lengths, areas, volumes—tather than 
numerical ones, and their manipulation of these magnitudes was exclu- 
sively verbal or rhetorical, rather than symbolic (or algebraic, as we would 
say today). As a consequence, their cumbersome computations disguised 
the analogies between solutions of similar problems, and thereby prevented 
the recognition and codification of general computational algorithms that 
could be applied to whole classes of similar problems. In short, the Greeks’ 
thoroughgoing geometrization of mathematics, to the exclusion of its 
algebraic aspect, effectively precluded the growth of an algorithmic tradi- 
tion based on generally applicable methods. 

It is somewhat paradoxical that this principal shortcoming of Greek 
mathematics stemmed directly from its principal virtue—the insistence on 
absolute logical rigor. The Greeks imposed on themselves standards of 
exact thought that prevented them from using and working with concepts 
that they could not completely and precisely formulate. For this reason 
they rejected irrationals as numbers, and excluded all traces of the infinite, 
such as explicit limit concepts, from their mathematics. Although the 
Greek bequest of deductive rigor is the distinguishing feature of modern 
mathematics, it is arguable that, had all succeeding generations also 
refused to use real numbers and limits until they fully understood them, 
the calculus might never have been developed, and mathematics might 
now be a dead and forgotten science. 

The Babylonians had handed down a working body of algebraic (though 
still rhetorical) techniques for the solution of problems involving linear and 
quadratic equations, based on the uncritical manipulation of numbers and 
unquestioning representation of all quantities in terms of numbers, as well 
as the free use of convenient approximations. However, the Pythagorean 
discovery of incommensurable line segments meant that geometric magni- 
tudes could not be measured by numbers as the Greeks understood them, 
and the requirement of exact solutions rendered approximations pointless. 

Consequently the handy algebra of Babylonia was converted into a 
ponderous geometric algebra based on the technique of application of 
areas. It no longer made sense to say that the area of a circle is zr?; one 
had to express the result in terms of Eudoxian proportions between pairs 
of comparable areas (e.g., circle to circle as square to square). A magnitude 
could not be expressed precisely in terms of a simple symbol like a 
number, but only in terms of a line segment constructed in prescribed 
fashion on a geometric figure. Algebraic operations became geometric 
constructions (the product of two linear magnitudes being a rectangular 
area, etc.). What we would call higher degree equations were meaningless 
because they did not correspond to geometric magnitudes in three dimen- 
sions. Hence complicated computations had to be expressed in terms of 
cumbersome geometric transformations of proportions. 
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Because of these difficulties that are inherent in geometric algebra and 
the formal theory of proportions, only a gifted Greek mathematician could 
carry out computations that today can be handled by a school child using 
elementary algebraic notation. We can understand the formidable 
sentences of Greek geometric algebra only by transforming them into 
concise formulas. Van der Waerden has emphasized the importance of a 
continuous oral tradition to compensate for the thorniness of Greek 
mathematical exposition: “As long as there was no interruption, as long as 
each generation could hand over its method to the next, everything went 
well and the science flourished. But as soon as some external cause 
brought about an interruption in the oral tradition, and only books 
remained, it became extremely difficult to assimilate the work of the great 
precursors and next to impossible to pass beyond it” [14], p. 266. To the 
evident difficulties of an exclusively written tradition may be added the 
accounts of recurrent burnings of hundreds of thousands of books at 
the great Alexandria library by successive waves of military invaders and 
religious fanatics. 

In summary, it was probably a combination of external and internal 
factors that brought a halt to the great enterprise of Greek theoretical 
mathematics, and delayed for many centuries the harvest of its eventual 
fruits. 


Mathematics in the Dark Ages 


The traditional date for the end of the Roman Empire in the West is 476, 
when the resident Roman emperor was displaced by a Goth intruder. The 
fall of Rome was followed by a breakdown of central government and the 
dissolution of urban life, and continuity of scientific development in 
Western Europe was lost altogether. For the next several centuries, the 
remnants of the Western Roman Empire, mainly in the form of the 
Catholic Church, were occupied with the task of civilizing barbarian tribes 
SO primitive that they had no written language, let alone any science or 
culture. 

Only the Latin encyclopedists preserved any connection, however tenu- 
ous, with the intellectual treasures of the past. It had been fashionable for 
cultured Romans to acquire a nodding acquaintance (though seldom any 
real understanding) with Greek science and philosophy. In order to meet 
this need, popularizers incorporated palatable condensations of scientific 
results into handbooks and manuals that were often superficial “scissors 
and paste” jobs. Although these books, taken together, presented an 
unorganized mass of frequently contradictory facts and myths, they con- 
stituted virtually the only sources of general scientific information in 
Western Europe during the early middle ages. 


The Arab Connection 81 


In regard to mathematics, the most important of these Roman writers 
was Boethius (ca. 480-524), who wrote four elementary textbooks—in 
arithmetic, geometry, astronomy, and music—that served as the basis for 
the quadrivium of the medieval monastic schools. His Arithmetic was an 
abridgement of the /ntroductio arithmeticae of Niomachus (ca. 100 a.p.). 
This latter work was basically a compilation of Pythagorean and Platonic 
number lore; its level may be judged by the fact that its author saw fit to 
include a 10 by 10 multiplication table. The Geometry of Boethius con- 
sisted of only the statements, without proofs, of the simpler propositions in 
the first four books of Euclid’s Elements. 

From this beginning, the level of scholarship in Western Europe gener- 
ally declined until the time of Gerbert, a Frenchman who became Pope 
Sylvester II (999-1003). Gerbert traveled to Spain to learn some of the 
mathematics and science of the Arab world, and apparently rekindled an 
emphasis on rudimentary mathematical instruction in the church schools 
that were, until the emergence of universities in the late twelfth century, 
the main centers of learning in Western Europe. Even so, geometry in this 
period consisted only of the enumeration of the first few theorems of 
Euclid, without any logical sequence or semblance of proofs. The Pytha- 
gorean theorem had apparently been long since forgotten. 

The most famous eleventh century mathematician (in Western Europe) 
seems to have been Franco of Liege, who wrote a much-quoted book on 
the quadrature of the circle. Thinking that the approximation 11/14= 
(1 /4)(22/7) was an exact value for the ratio of the area of a circle to that 
of the circumscribed square, he thought he could solve the classical 
problem of “squaring the circle” by somehow constructing a square with 
area equal to that of a 11 by 14 rectangle, and hence to that of a circle of 
radius 7. It is to this latter construction of a square that his work is largely 
devoted! Clearly an infusion of ancient Greek knowledge was necessary to 
enable Western Europe to rise above the intellectual morass of the dark 
ages. 


The Arab Connection 


The Greek cultural heritage was to some extent preserved throughout the 
middle ages by the shrinking Byzantine Empire centered at Constantino- 
ple. However, it was the Arab hegemony in the Mediterranean area that 
effectively nurtured this heritage during the centuries of the dark ages, and 
finally transmitted it to Western Europe. 

During the seventh and eighth centuries, the Muslim empire rapidly 
extended its domination along the southern crescent of the Mediterranean, 
from Persia and Syria in the east to Spain and Morocco in the west. A 
century of conquest was followed during the second half of the eighth 
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century by the rapid development of an Islamic.culture that avidly ab- 
sorbed the knowledge of the newly conquered lands. The eastern capital of 
Baghdad became a new cosmopolitan center, a new Alexandria, where the 
ancient sciences of Greece, India, and Mesopotamia were studied. 

The mathematician and astronomer al-Khowarizmi worked at Baghdad 
during the early ninth century, and wrote historically important textbooks 
on arithmetic and algebra. The first of these was an exposition of the 
Hindu art of reckoning. The Hindus in India were fascinated by numbers 
and computations, but had little interest in geometry and deductive proofs. 
This division of interests worked to their practical advantage, for it 
permitted them to calculate freely with rational numbers and irrational 
roots alike, oblivious to the fine distinctions of commensurability and 
logical subtleties that had impeded the Greeks. They combined in a single 
system of numeration three separate features of various previous sys- 
tems—decimal (base 10) numerals, positional notation, and a zero symbol. 
This “Hindu-Arabic numeration,” essentially our present system, came 
into widespread use through the influence of al-Khowarizmi’s arithmetic. 

Al-Khowarizmi’s second book, entitled Al-jabr wa’l mugqabalah, has 
provided us with the modern word “algebra.” Apparently “al-jabr’” re- 
ferred to the transposition of subtracted terms to the other side of an 
equation, and “mugabalah” to the cancellation of equal terms on the two 
sides of an equation. The first six chapters of the A/-jabr list routine 
procedures, in terms of illustrative examples, for the solution of those 
linear and quadratic equations with positive coefficients that have positive 
roots—negative roots are not considered. For a translation of these brief 
chapters, see Edward Grant’s medieval science source book ({[10], 
pp. 106-111). 

Al-Khowarizmi’s treatment was entirely rhetorical or verbal, with num- 
bers even written out in word form. In this respect it was a retrogression 
from the Arithmetica of Diophantus (third century a.p.). Diophantus, 
whose problems dealt mainly with number theory rather than algebra in 
the present sense, had introduced abbreviations for powers of the un- 
known, such as A’ for the square, K” for the cube, AYA for the square- 
square or fourth power, AK” for the square-cube or fifth power, etc. The 
chief merit of al-Khowarizmi’s exposition was its return to the Babylonian 
and Hindu tradition of working routinely with quantities as “mere” num- 
bers rather than geometric magnitudes, and of reducing the solution of 
equations to operational procedures or algorithms. Indeed, the word “algo- 
rithm” is derived from the author’s name: al-Khowarizmi—algorismi— 
algorithm. 

Al-Khowarizmi’s equations involved three kinds of quantities: roots, 
Squares, and numbers (that is, x terms, x? terms, and constants). He stated 
the equation x? + 10x =39 as “a square and ten of its roots are equal to 
thirty-nine.” His solution reads essentially as follows: Take half the num- 
ber of roots, that is, five, and multiply this by itself to obtain twenty-five. 
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Figure | 


Add this to the thirty-nine, giving sixty-four. Take the square root, or eight, 
and subtract from it half the number of roots, namely five. The result, 
three, is the desired root. In terms of modern notation, he is saying that the 


(positive) root of x7+2bx=cis —b+ Vb*+c. 

Al-Khowarizmi verifies his algorithms by means of geometric construc- 
tions that indicate Greek influence. To explain the solution of the equation 
x?+ 10x =39, he starts with a square ABCD of area x? (Fig. 1). The 
equation calls for the addition of 10x, so he adds the rectangles CDEF and 
BCHI, each of area 5x. He “completes the square” by adding the square 
CFGH of area 25 (the square of “half the number of roots’). Since 
x?+10x equals 39, it follows that the larger square AEGI has area 
39 + 25 = 64. Hence its edge is 8, so x = 8 —5=3. 


EXERCISE 1. Solve the equation x? +8x =65 (for its positive root) by means of a 
construction similar to Figure 1. 


During the ninth and tenth centuries Euclid’s Elements and many of the 
works of Archimedes, Apollonius, and Ptolemy were translated from 
Greek into Arabic. Some of them are extant today only because of these 
Arabic translations. The Greek masterpieces were studied carefully, and 
alternative proofs and generalizations (e.g., of the Pythagorean theorem) 
were produced, indicating a reasonable level of understanding of Greek 
mathematics in Islam. 

Arab mathematical science reached its apex in the eleventh century. 
Al-Haitham (ca. 965-1039), known in the West as Alhazen, wrote an 
influential treatise on geometrical optics and extended some of Archi- 
medes’ volume results. For example, he showed that, if a segment of a 
parabola is revolved about its base (rather than about its axis, as in 
Archimedes’ On Conoids), then the volume of the solid obtained is 8/15 
that of the circumscribed cylinder. This computation required formulas for 
the sums of the first n cubes and fourth powers whereas Archimedes had 
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Figure 2 


used only the formulas for the sums of the first m integers and of their 
squares. The following exercise outlines Alhazen’s ingenious geometric 
derivation of these formulas, which were to play a continuing role in the 
development of the calculus. It 1s based on Figure 2, from which we read 


off the formula 


(nti) > j* = > age > ( > i} 
i=] i=) p=1\i 
EXERCISE 2. (a) Substitute k = 1 and the formula 
1+2+--- +n =tn?+1n 
into (1) to derive the formula 
?+27+--- +r = 
(b) Substitute = 2 and equation (3) into (1) to derive the formula 
B+24+--- +n? =inttini tin’. 
(c) Substitute k =3 and equation (4) into (1) to derive the formula 
44244 --- 4n* =inrt+int+ine—in. 


(d) Apply the above formulas to show that 


n na n 
SMS (rn? - 2?) = nh -2n? SP 4+ DS 4 = End —1nt Sn. 


jw] i=] i=] 
Add n‘* to both sides of (6) to obtain 


n—l 
DS (n? — i?) = fn + int — stn. 
i=0 


(1) 


(2) 


(3) 


(4) 


(5) 


(6) 


(7) 
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Figure 3 


EXERCISE 3. Let the segment ACC, (Fig. 3) of the parabola x = ky’, over the 
interval [0, a] on the x-axis, be revolved around the ordinate CoC,, to obtain the 
solid of revolution S. Denote by Z the circumscribed cylinder with radius a and 
height b, where a= kb’. Let the points A,,..., A, —, divide the interval [0, 5] on 
the y-axis into n equal subintervals of length h=b/n. Let B,,..., B,_,; and 
C,,..., C,—, be the corresponding points on the parabola and the ordinate CoC,, 
respectively. Finally, let P be the union of the cylinders with radius B,C; and height 
C,_,C;, and Q the union of the cylinders with radius B;_,C;_, and height C;_,C,, 
i=1l,...,n.ThenPCSCQ. 
(a) Show that 


v(P) = > ak*h>(n2 — i2)? 
i=] 
and 
v(Q) = SS ak*h>(n? — i2)*. 


i=0 
(b) Apply Equations (6) and (7) to show that 
v(P) < £v(Z) < v(Q). 


Hint: Calculate o(P)/v(Z) and v(Q)/v(Z). 
(c) Conclude that v(.S) = 8v(Z)/15 as desired. 


For approximately four centuries the Muslim world preserved the Greek 
mathematical tradition and enriched it with the addition of Eastern ele- 
ments of arithmetic and algebra. By the twelfth century Arabic science had 
begun to decline but, fortunately, Western Europe had emerged from its 
dark ages with an appetite for new knowledge. The Elements of Euclid was 
translated from Arabic into Latin in 1142 by Adelard of Bath, and Robert 
of Chester produced a Latin translation of al-Khowarizmi’s Algebra in 
1145. The most prolific of a small army of Latin translators in Spain, after 
the reclamation of Toledo from the Moors, was Gerard of Cremona 
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(1114-1187) who produced an improved translation of the E/ements as well 
as some of Archimedes’ works, including Measurement of the Circle and 
parts of On the Sphere and Cylinder. In 1269 William of Moerbeke 
published a Latin translation of the extant Greek corpus of Archimedean 
treatises. 

These translations served to reestablish quantitative science in Western 
Europe, at least at the level of elementary algebra and geometry. However, 
the works of Archimedes were too sophisticated for immediate or 
widespread assimilation, and did not bear significant fruit until the six- 
teenth and seventeenth centuries. The principal medieval contributions to 
later progress in mathematics stemmed from the Scholastic speculations on 
continuity and variability. 


Medieval Speculations on Motion and Variability 


The age of Latin translation brought the ancient wisdom to a now dynamic 
Europe with flourishing new universities. Whereas the gap between 
Boethius and Archimedes would take several centuries to bridge, the vast 
intellectual system encompassed by the scientific and philosophical trea- 
tises of Aristotle was more accessible. Indeed, the absorption and assimila- 
tion of Aristotelian thought that took place in the thirteenth century makes 
it the major turning point in Western intellectual history between the 
fourth century B.c.—the heroic age of Plato, Aristotle, and Euclid—and 
the scientific revolution of the seventeenth century. 

Aristotle’s treatise on Physics had explored the nature of the infinite, the 
existence of indivisibles or infinitesimals, and the divisibility of continuous 
quantities—time, motion, and geometric magnitudes. Having pointed out 
that “motion is supposed to belong to the class of things which are 
continuous; and the infinite presents itself first in the continuous—that is 
how it comes about that ‘infinite’ is often used in definitions of the 
continuous (‘what is infinitely divisible is continuous’)” [Book IIT, Chapt. 
1], Aristotle charged scholars “to discuss the infinite and to inquire 
whether there is such a thing or not, and, if there is, what it 1s” [Book III, 
Chapt. 4]. 

The medieval Scholastic philosophers responded to this challenge with 
evident relish. Their detailed (and often interminably prolix) speculations 
and disputations on the infinite, the nature of the continuum, and the 
existence of indivisibles were more philosophical than mathematical in 
character, and were generally inconclusive from a scientific viewpoint. 
However, they frequently showed a keen appreciation of logical difficulties 
that were not finally resolved until the late nineteenth century. This 
“sub-mathematical” activity of the late middle ages no doubt enhanced the 
acceptability of infinitesimal techniques that were officially taboo in Greek 
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mathematics but were freely used to great advantage in the seventeenth 
century. 

Of more immediate consequence were the quantitative studies of change 
and motion that began in the early fourteenth century. The concept of 
continuous variation of quantities had played no role in the mathematics 
of the Greeks—their quantities were either numerical and discrete or 
geometric and static. Their algebra dealt with constants rather than vari- 
ables, and their geometry treated fixed and unchanging geometric figures. 
They studied only uniform (linear or circular) motion, so such concepts as 
acceleration and instantaneous velocity had no meaning to the Greeks. In 
short, Greek science did not discuss phenomena of change or variability in 
quantitative terms. 

The problem of quantifying change was attacked during the second 
quarter of the fourteenth century by a group of logicians and natural 
philosophers at Merton College in Oxford, including Thomas Bradwardine 
(the “Doctor profundus” of his day and later Archbishop of Canterbury) 
and Richard Swineshead (known to medievals as the Calculator, as Aristo- 
tle was the Philosopher and Paul the Apostle). They were concerned 
specifically with what was then referred to as the /atitude of forms, and 
might today be described as the intensity of qualities. In Aristotelian 
philosophy, qualities were attributes that admit of intensity (at a point of a 
body or at an instant in time), such as hotness and density. Intensive (or 
local) qualities were distinguished from the corresponding extensive (or 
global) quantities, such as heat and weight. Analogously, (instantaneous) 
velocity was regarded as a quality, the intensity of motion; the correspond- 
ing quantity was the total motion, i.e. the distance covered. The Merton 
scholars sought to study variations in the intensity of a quality, from point 
to point of a body, or from point to point in time. We will restrict our 
discussion here to their consideration of the case of motion and velocity. 

They recognized that the heart of the matter is the framing of definitions 
of these terms that provide an adequate basis for quantitative analysis. 
They defined motion to be uniform (i.e. constant speed) if equal distances 
are described in equal times. Uniform acceleration was defined to be that 
for which equal increments of velocity are acquired in equal intervals of 
time. For even this simplest case of variable motion, a definition of 
instantaneous velocity was needed. Lacking the notion of limits of ratios, 
they could only define instantaneous velocity in terms of the distance that 
would be traversed by a point if it moved uniformly over a period of time 
with the same speed it possessed at the instant in question. Although 
circular in nature, this intuitive concept of instantaneous velocity sufficed 
for the derivation of correct results in the fourteenth century (as it did in 
the early days of calculus, and continues today to do in everyday scientific 
discourse). 

The central result derived from these concepts was the Merton Rule of 
uniform acceleration, the mean speed theorem: 
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If a moving body is uniformly accelerated during a given time interval, 
then the total distance s traversed is that which it would move during the 
same time interval with a uniform velocity equal to the average of its 
initial velocity v, and its final velocity v, (namely, its instantaneous 
velocity at the midpoint of the time interval). 


That 1s, 
,= 5 (0, Fla v,)t, (8) 


where ¢ is the length of the time interval. A number of lengthy and 
rhetorical but ingenious derivations of this theorem were given by the 
Merton scholars; accounts and discussions of them may be found in 
Chapter 5 of Clagett [5]. 


EXERCISE 4. Rewrite the Merton Rule (8) in the form 
s = v,t +tat’. (9) 


What is the number a, and why is it constant (independent of ¢)? 


EXERCISE 5. To identify (9) as a calculus result, obtain it by antidifferentiation, 
starting with s”(t) =a, s’(0) = v,. 


The Merton studies spread to France and Italy in the mid-fourteenth 
century. In his Treatise on the Configurations of Qualities and Motions, 
written in the 1350s, the Parisian scholastic Nicole Oresme introduced the 
important concept of graphical representations, or geometrical “configura- 
tions”, of intensities of qualities. English translations of this work can be 
found in Grant’s medieval science source book [10] or Clagett’s compre- 
hensive analysis of the medieval geometry of qualities and motions [7]. 

Oresme discusses mainly the case of a “linear” quality, one whose 
“extension” is measured by an interval (line segment) of either space (as in 
the case of a rod of variable density) or time (as in the case of a moving 
point). He proposes to measure the intensity of the quality at each point of 
the reference interval by a perpendicular line segment at that point, 
thereby constructing a graph with the reference interval as its base. As he 
says at the beginning of his treatise, 


Every measurable thing except numbers is imagined in the manner of 
continuous quantity. Therefore, for the mensuration of such a thing, it is 
necessary that points, lines, and surfaces, or their properties, be imagined. 
For in them (i.e., the geometrical entities), as the Philosopher has it, 
measure or ratio is initially found .... Therefore, every intensity which 
can be acquired successively ought to be imagined by a straight line 
perpendicularly erected on some point of the space or subject of the 
intensible thing, e.g., a quality .... And since the quantity or ratio of 
lines is better known and is more readily conceived by us—nay the line is 
in the first species of continua, therefore such intensity ought to be 


Medieval Speculations on Motion and Variability 89 


Cc 
Q 
D vy 
Vo 
A P B 
SSS 
Figure 4 
imagined by lines.... Therefore, equal intensities are designated by 


equal lines, a double intensity by a double line, and always in the same 
way if one proceeds proportionally. 


He refers to the reference interval for a quality as its longitude, and its 
intensity at a point as its /atitude or altitude there (perhaps adopting these 
terms from their geographical use). Finally, he specifies that the quantity of 
a linear quality is to be “imagined by” its configuration as described above. 

For example, in the case of a uniformly accelerated motion during a 
time interval [0, ¢] corresponding to the longitude AB in Figure 4, the 
latitude at each point P of AB is an ordinate PQ whose length is 
the velocity at the corresponding instant, so the upper edge CD of the 
configuration is simply a time-velocity graph. Oresme saw that the defini- 
tion of uniform acceleration implies that CD is a straight line segment, so 
the configuration is a trapezoid with base AB = ¢ and heights AD = v, and 
BC = v,. He assumed without explicit proof that the area s of this trapezoid 
equals the total distance traveled, perhaps on the basis of regarding this 
area as made up of very many vertical segments or indivisibles, each 
representing a velocity continued for a very short or infinitesimal time. At 
any rate, it follows immediately from the formula for the area of a 
trapezoid that 


s = 5(v, + vt, (8) 


so Oresme has provided the Merton Rule with a geometrical verification. 


EXERCISE 6. Consider the case of uniformly accelerated motion with v, =0, so the 
trapezoid reduces to a triangle (Fig. 5). Subdivide the base AB into n equal time 
subintervals, and denote by 5s), 55, 53,...,5, the distances traveled during these 
successive subintervals. Show that these distances are proportional to the odd 
numbers 1, 3, 5,..., 2n-—1, that is, 
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This “law of odd numbers,” for the distances traveled in successive equal intervals 
of time under constant or uniform acceleration, was important for Galileo’s later 
empirical verification that freely falling bodies (near the surface of the earth) 
experience constant acceleration. 


In his Treatise on Configurations Oresme introduced, at least implicitly, 
four innovative ideas: 

(1) the measurement of diverse types of physical variables (such as 
temperature, density, velocity) by means of line segments (in lieu of real 
numbers, following Greek precepts); 

(2) some notion of a functional relationship between variables (e.g., 
velocity as a function of time); 

(3) a diagrammatic or graphical representation of such a functional 
relationship. This may be regarded as a partial step towards the introduc- 
tion of a coordinate system; 

(4) a conceptual process of “integration” or continuous summation to 
calculate distance as the area under a velocity-time graph, albeit Oresme 
only had the technical machinery to perform this calculation in the case of 
uniformly accelerated motion. 

Mature versions of these incipient ideas played key roles in the seven- 
teenth century development of the calculus. The work of Oresme and the 
Merton scholars on motion was widely disseminated in Europe for the next 
two centuries, and undoubtedly led to the work of Galileo (once thought to 
have been original with him), who assembled the medieval components 
into a new science of mechanics. For example, the Third Day of Galileo’s 
Discourses on Two New Sciences (1638) begins with the mean speed 
theorem, with a proof and accompanying geometric diagram that are 
strikingly similar to those of Oresme, and proceeds to the distance formula 
s=at*/2 for uniformly accelerated motion from rest (see Equation (9)), 
from which the law of odd numbers is then derived (as in Exercise 6). 
Chapter II of Clagett [7], entitled “The Configuration Doctrine in Histori- 
cal Perspective,” gives a detailed account of the origins and subsequent 
influence of the medieval geometry of motions. 
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Medieval Infinite Series Summations 


The subject of infinite series fascinated medieval philosophers and 
mathematicians, appealing to both their interest in the infinite and their 
disputatious delight with apparent paradoxes. The work of the Merton 
scholars on the latitude of forms led naturally to various infinite series 
problems. For example, Swineshead solved a problem that, when stated in 
terms of motion, reads as follows. 


If a point moves throughout the first half of a certain time interval with a 
constant velocity, throughout the next quarter of the interval at double 
the initial velocity, throughout the following eighth at triple the initial 
velocity, and so on ad infinitum; then the average velocity during the 
whole time interval will be double the initial velocity. 


Taking both the time interval and the initial velocity as unity, this is 
equivalent to the summation 
lL. 2 3 n 
5 gt ge ay ee (10) 
Swineshead gave a long and tedious verbal proof of (10). It is equivalent 
to arguing that the effect of doubling the velocity during the last half of the 
interval is equivalent to that of doubling it during the first half of the 
interval; the additional effect (over doubling) of tripling the velocity during 
the last quarter of the interval is equivalent to that of doubling it during 
the second subinterval (of length one-fourth); the additional effect (over 
tripling) of quadrupling it during the last eighth of the interval is equiv- 
alent to that of doubling it during the third subinterval (of length one- 
eight); and so on ad infinitum. Hence the total cumulative effect is the 
same as that of doubling the initial velocity during all of the subintervals. 
This appears to be the first infinite series summation, other than geomet- 
ric series such as 


ee es (11) 


which Archimedes effectively used in the Quadrature of the Parabola 
(albeit without actually extending the sum to infinity). In a tract written 
around 1350, Oresme gave the more general geometric series 


a a ] a 1 \” 


(where & is an integer greater than one) which includes (11) as a special 
case (why?). He stated (12) verbally as follows: 


If an aliquot part [one kth] should be taken from some quantity [a], and 
from the first remainder such a part is taken, and from the second 
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remainder such a part is taken, and so on into infinity, such a quantity 
would be consumed exactly—no more, no less—by such a mode of 
subtraction ([{10], p. 133). 


The proof is that, after the first part is subtracted, the remainder is 
a(1 —1/k); after a kth part of this is subtracted, the (second) remainder is 
a(1—1/k)’; etc. Each subtraction multiplies the previous remainder by 
(1 —1/k), so the nth remainder is a(1 — 1/k)”. This means that 


a 1 Lye ry 

Fit ine aiid a9 | +4(1-) =~ a. 
Since a(1—1/k)” obviously approaches zero as n goes to infinity, (12) 
follows, as desired. Oresme finds interesting the corollary that “if one- 
thousandth part of a foot were taken away [or removed], then [if] one- 


thousandth part of the remainder of this foot [were removed], and so on 
into infinity, exactly one foot would be subtracted from this [original 


foot].” 
In the same tract Oresme proves that the harmonic series 
eee forese 
23 n 


diverges, meaning that if the successive terms were added one-by-one then, 
as he puts it, “the whole would become infinite.” In proof of this he points 
out that the sum of 4 and + is greater than 3, as is the sum of the next four 
terms + through i, as is the sum of the next eight terms 5 through ;, etc. 
This was the first example of a divergent series which “has a chance” of 
converging because its terms approach zero. 

In his Treatise on Configurations, Oresme gave a geometric method for 
summing series (10). Figure 6 shows two dissections of the configuration or 
graph of Swineshead’s motion, with velocity 1 during the first half of the 
unit time interval, velocity 2 during the next quarter, velocity 3 during the 
next eight, etc. Since it is clear that a(A,) = 1/2” and a(B,) = 1/2” for each 


Figure 6 
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n, we see that 


1 2 3 n = 
7a aa ai Saeed a(A,,) 
= > a(B,) 
n=0 
epee dake + + 
2 4 pig 


since the latter series is the geometric series (12) with a= 1, k =2. 


EXERCISE 7. Apply (12) to show that 


ae. 3 3 
a 6 a ae eed 


EXERCISE 8. Apply Oresme’s geometric method to show that 


3 3 3 3 4 


Think of a motion with velocity 1 during the first ¢ of an unit time interval, with 
velocity 2 during the next 4 of the interval, velocity 3 during the next 2, etc. 


The study of infinite series continued during the fifteenth and sixteenth 
centuries in the mode of Swineshead and Oresme, without significant 
advance over their exclusively verbal and geometrical techniques. The 
principal contribution of these early infinite series investigations lay not in 
the particular results obtained, but in the encouragement of a new point of 
view—the free acceptance of infinite processes in mathematics. Medieval 
currents of thought thus prepared the way for the more significant work on 
infinite series and processes of the seventeenth century, when a more 
potent arsenal of arithmetic and algebraic techniques was available. 


The Analytic Art of Viéte 


The scientific and cultural Renaissance of the fifteenth and sixteenth 
centuries is often associated with the increased availability of the ancient 
Greek classics resulting from the invention of the printing press. In 
mathematics, however, the Renaissance consisted largely of rapid progress 
in the area of algebra, and this progress stemmed less from restored 
classical traditions than from the practical arithmetic and algebra of late 
medieval commercial circles that was based on problem-solving methods 
dating back to al-Khowarizmi. The publication in 1545 of Cardan’s Ars 
Magna served to publicize the solutions by del Ferro and Tartaglia of 
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cubic equations and by Ferrari of quartic equations. These exciting dis- 
coveries stimulated an accelerated development of algebraic techniques. 

The algebra of this time was still largely verbal rather than symbolic, 
although the use of abbreviations (such as the Italian p and m for plus and 
minus) and symbols (such as the German + and —) was gradually 
emerging. In terms of its problems, the algebra of the early sixteenth 
century concentrated on finding the unknown in a given equation with 
specific numerical coefficients. As a consequence, algebra was still essen- 
tially a “bag of tricks” rather than a general method, because every special 
case required a different trick. The idea of studying a general equation 
representing a whole class of equations had not yet made its appearance. 

In order to study the general cubic equation, for example, it is necessary 
to distinguish between the roles of the unknown variable whose value is 
sought and of the coefficients which are parameters in the problem—their 
values are unspecified even though they are assumed known in advance. 
This crucial idea, of a clear-cut distinction between parameters (known) 
and variables (unknown), was contributed by the Frenchman Frangois 
Viéte (1540-1603). In his Introduction to the Analytic Art of 1591 (see the 
book of Klein [11] for an English translation) he wrote: 


In order that {the setting up of equations] be aided by some art, it is 
necessary that the given [parameters] be distinguished from the unknown 
[variables] being sought by a constant, perpetual, and highly conspicuous 
convention (symbolo), such as by designating the [variables] being sought 
by the letter A or some other vowel FE, J, O, U, V, and the given 
[parameters] by the letters B, G, D, or other consonants. 


Thus Viéte systematically used vowels for variables and consonants for 
parameters. In the designation of algebraic operations his notation was 
“syncopated,” involving a combination of verbal abbreviations (such as A 
quadratus and A cubus for A* and A*) and symbols (such as + and —). 
For example he would write 


A cub + B plano in A aequatur C in A quad + D solido 
for 
A? + BA = CA*+D. 


The terms plano and solido are included to preserve homogeneity of degree 
in the equation; the symbol = for equality (aequalis) was not yet in 
common use, although it had been introduced in 1557 by the Englishman 
Robert Recorde. The transition to a fully symbolic algebraic notation took 
place during the interval between Viéte and Descartes. 

Although his operational notation was still somewhat primitive by 
modern standards, Viéte’s clarification of the distinct roles of variables and 
parameters was an important step towards the seventeenth century re- 
orientation of mathematics, from the study of particular problems to the 
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search for general methods. The ability to deal with parameters as such 
focused attention on solution procedures rather than specific solutions 
themselves, and on questions concerning relationships between different 
problems. This shift in emphasis, from the particular to the general, was a 
necessary ingredient for the algorithmic approach that characterizes the 
calculus. 


The Analytic Geometry of Descartes and Fermat 


The final step in preparation for the new infinitesimal mathematics, and 
the most far-reaching one, was the origination of analytic geometry by 
Rene Descartes (1596-1650) and Pierre de Fermat (1601-1665). Descartes’ 
Geometry was published in 1637 as one of three appendices to his Discourse 
on the Method (of Reasoning Well and Seeking Truth in the Sciences). In 
the same year Fermat sent to his correspondents in Paris his Introduction to 
Plane and Solid Loci. These two essays established the foundations for 
analytic geometry. Although Fermat’s work was more systematic in some 
respects, it was not actually published until 1679 after his death, and for 
this reason we speak today of Cartesian geometry rather than Fermatian 
geometry. 

The central idea of analytic geometry is the correspondence between an 
equation f(x, y) =0 and the locus (generally a curve) consisting of all those 
points whose coordinates (x, y) relative to two fixed perpendicular axes 
satisfy the equation. Actually, neither Descartes nor Fermat systematically 
used two coordinate axes in the way now standard. The closest either came 
is indicated by Fermat’s guiding principle: 


Whenever in a final equation two unknown quantities are found, we have 
a locus, the extremity of one of these describing a line, straight or curved. 


For Fermat (as well as Descartes) the two unknown quantities in an 
equation were line segments rather than numbers. One of these was 
measured to the right from a reference point on a horizontal axis, and the 
second was placed as a vertical ordinate at the endpoint of the first 
(Fig. 7). Fermat’s principle then says that the endpoint of the ordinate 
describes the curve corresponding to the given equation. Descartes’ general 
practice was simular, so both, 1n fact, dealt with “ordinate geometry”’ rather 
than coordinate geometry. 

Fermat adhered to the algebraic notation of Viéte, and designated his 
variables as A and E instead of x and y. However, Descartes used the fully 
symbolic algebraic notation that is standard today (or, more accurately, we 
use Descartes’ notation), with the single exception that he wrote > instead 
of = for equality. He standardized the exponential notation for powers, 
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f(x, y) =0 


O x 
Figure 7 


and initiated the common practice of using letters near the beginning of 
the alphabet for parameters and those near the end for variables. 

The aim of both Descartes and Fermat was to apply the methods of 
Renaissance algebra to the solution of problems in geometry. Descartes 
stated the plan as follows ([13], pp. 6-8): 


If, then, we wish to solve any problem, we first suppose the problem 
already affected, and give names [symbols] to all the lines that seem 
needful for its construction—to those that are unknown as well as to 

_ those that are known. Then, making no distinction between known and 
unknown lines, we must unravel the difficulty in any way that shows most 
naturally the relations between these lines, until we find it possible to 
express a single quantity in two ways. This will constitute an equation, 
since the terms of one of these two expressions are together equal to the 
terms of the other. 


Thus Descartes started with a geometrical problem, ordinarily involving a 
given curve, defined either as a static locus in the usual Greek fashion or in 
terms of uniform continuous motion (as the Archimedean spiral). His 
procedure was to translate the geometrical problem into the language of an 
algebraic equation, then simplify and finally solve this equation. 

Whereas Descartes ordinarily began with a curve and derived its alge- 
braic equation, Fermat ordinarily began with an algebraic equation and 
derived from it the geometric properties of the corresponding curve. For 
example, he started with the general second degree equation in two 
variables, 


ax*+bxyt+c’*t+dxte+t+f=QO, (13) 


showed by translation and rotation techniques that its locus is a conic 
section (except for degenerate cases), and classified the various cases of 
(13) as to whether this conic section is an ellipse, hyperbola, or parabola. A 
discussion of this work can be found in Chapter III of Mahoney’s mathe- 
matical biography of Fermat [12]. 

Thus the works of Descartes and Fermat, taken together, encompass the 
two complementary aspects of analytic geometry—studying equations by 
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means of curves, and studying curves defined by equations. An important 
common feature of their work in analytic geometry was their concentration 
on indeterminate equations involving continuous variables. Viéte, for exam- 
ple, had studied only determinate equations, in which the “variable”, 
although unknown, is actually a fixed constant to be found. 

The notion of a variable, as first emphasized explicitly by Descartes and 
Fermat, was indispensable to the development of the calculus—the subject 
can hardly be discussed except in terms of continuous variables. Moreover, 
analytic geometry opened up a vast virgin territory of new curves to be 
studied, and called for the invention of algorithmic techniques for their 
systematic investigation. Whereas the Greek geometers had suffered from 
a paucity of known curves, a new curve could now be introduced by the 
simple act of writing down a new equation. In this way, analytic geometry 
provided both a much broadened field of play for the infinitesimal tech- 
niques of the seventeenth century, and the technical machinery needed for 
their elucidation. 
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Early Indivisibles and 
Infinitesimal Techniques 


Introduction 


During the late middle ages Euclid’s Elements and the works of Archi- 
medes had been extant, but not always generally accessible and never fully 
mastered. The sixteenth century saw, finally, the wide dissemination and 
serious study of these Greek mathematical masterworks. By the latter part 
of the century, the understanding of Archimedes’ work had reached the 
point that further progress along the lines of classical Greek mathematics 
was possible. During the century preceding Newton and Leibniz the 
method of exhaustion was refined and applied by numerous mathemati- 
cians to a wide variety of new quadrature, cubature, and rectification 
problems (see the reviews of this work by Baron [2], Chapter 3, and 
Whiteside [12], pp. 331-348). 

Although Archimedes’ accomplishments provided the chief inspiration 
for the resumption of mathematical progress, the time was ripe for the de- 
velopment of simpler new methods, ones that could be applied to the 
investigation of area and volume problems with greater ease than could the 
method of exhaustion with its tedious double reductio ad absurdum proofs. 
While continuing to regard Archimedean proofs as the ultimate models of 
rigor and precision, the Renaissance mathematical mind was more inter- 
ested in quick new results and methods of rapid discovery than in the 
stringent requirements of rigorous proof. The common view of the period 
was expressed in 1657 by Huygens as follows: 


In order to achieve the confidence of the experts it is not of great interest 
whether we give an absolute demonstration or such a foundation of it that 


after having seen it they do not doubt that a perfect demonstration can be 
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given. I am willing to concede that it should appear in a clear, elegant, 
and ingenious form, as in all works of Archimedes. But the first and most 
important thing is the mode of discovery itself, which men of learning 
delight in knowing. Hence it seems that we must above all follow that 
method by which this can be understood and presented most concisely 
and clearly. We then save ourselves the labor of writing, and others that 
of reading—those others who have no time to take notice of the enormous 
quantity of geometrical inventions which increase from day to day and in 
this learned century seem to grow beyond bounds if they must use the 
prolix and perfect method of the Ancients. (Struik [11], p. 189). 


The Greek “horror of the infinite” had prevented the development of a 
usable theory of limits to replace the ubiquitous double reductio ad 
absurdum proofs. But, as a result of the medieval scholastic speculations on 
infinity and the continuum, seventeenth century mathematicians were no 
longer reluctant to introduce infinitesimal techniques. Whereas the Greek 
insistence on absolute rigor had banished irrational magnitudes from the 
field of number (and hence number from geometry), irrational numbers 
now came to be freely employed, even though they still had no logical 
basis as numbers, and could only be interpreted rigorously as geometrical 
magnitudes. In addition, the symbolic algebra of Viéte and Descartes 
facilitated the development of formal techniques that emphasized com- 
putational method more than logical proof. Finally, the algebraic repre- 
sentation of curves (analytic geometry) permitted the rapid and easy 
formulation of new and diverse area and volume problems for investiga- 
tion. 

This rich amalgam of mathematical ingredients—Archimedean prob- 
lems, algebraic computational techniques, and the free use of intuitive 
concepts of the infinite—produced a profusion of powerful (if loosely 
based) infinitesimal methods for the solution of area and volume problems 
during the “century of anticipation” preceding the time of Newton and 
Leibniz. As we will see in this chapter, these developments constituted a 
gradual arithmetization of problems whose treatment in antiquity had been 
wholly geometric in character and approach. 


Johann Kepler (1571-1630) 


Kepler is most famous for his discovery of the laws of planetary motion, to 
the effect that (1) a planet moves along an elliptical orbit with the sun at 
one focus of the ellipse, in such a way that (II) the radius vector from the 
sun to the planet sweeps out area at a constant rate, with (III) the squares 
of the periods of revolution of any two planets being proportional to the 
cubes of the major semi-axes of their orbits. Newton later showed that 
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Figure 1 


‘these three laws, which Kepler deduced from observational data, follow 
from the inverse-square law of gravitation. 

The second law (of areas), published in 1609, was derived by means of a 
curious combination of compensating errors. Kepler first deduced from 
astronomical observations that, when a planet is at either of its apsi- 
des—the nearest and farthest points on its orbit from the sun, its velocity v 
is inversely proportional to its distance r from the sun. That is, if v, and v, 
are its velocities at the aphelion P, and perihelion P, (Fig. 1), then there is 
a constant k such that 

bene and pe (1) 
r\ 2 


He next purported to prove that this is true at every point P of the orbit, 


=—, 2 

OS (2) 

However, this “theorem” is false—actually the velocity at P is inversely 

proportional (as it turns out) to the perpendicular distance from S to the 

tangent line to the ellipse at P (this distance being equal to 7 only at the 

apsides where the tangent line and radius vector happen to be perpendicu- 
lar). 

Nevertheless, Kepler proceeded on the basis of the incorrect relation (2). 

In order to calculate the time ¢ required to traverse an arc PQ of its orbit 


Q = Past 


Figure 2 
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(Fig. 2), he divided the arc into a large number of subarcs of equal lengths 
As. If r, is the distance SP, from the sun to the initial point P, of the ith 
subarc PP., 1 v, the velocity at P,, and ¢, the time required for the planet 
to traverse this subarc, then (2) gives 


n As l n 
(= Deed Sag D nds (3) 


Thus the (incorrect) relation v = k/r implies that the time ¢ 1s proportional 
to the sum 7 7; of the radii. 

At this point Kepler mistakenly assumes that the same sum is propor- 
tional to the area SPQ, saying 


Since I was aware that there exists an infinite number of points on the 
orbit and accordingly an infinite number of distances [from the sun] the 
idea occurred to me that the sum of these distances is contained in the 
area of the orbit. For I remember that in the same manner Archimedes 
too divided the circle into an infinite number of triangles (quoted by 
Koestler [8], p. 327). 


His thought here is reminiscent of the heuristic derivation of the circle area 
formula by the fifth century B.c. Greeks (rather than Archimedes). If the 
ith piece or “indivisible” of the area were a triangle with base r, and height 
As, then it would follow that 


Awt> nA (a 
and this together with (3) would imply that 
= ht (h=k/2). (5) 


This is how Kepler actually obtained his (correct) second law to the effect 
that the area swept out is proportional to the time elapsed. For additional 
discussion of this comedy of errors, see the article by Aiton [1] or the 
books of Dreyer ({7], pp. 387-388) and Koestler ({8], pp. 327-328). 
EXERCISE 1. Explain why (4) is false in general. Note that (4) gives 

A =f rds 


in integral notation, whereas the (correct) area formula in polar coordinates is 
A =3 fra. 
Does As equal rA@ for a small segment of an arbitrary curve? 
Kepler’s more systematic work on the calculation of areas and volumes 


by infinitesimal techniques was undertaken for more prosaic reasons than 
the study of the harmony of the celestial spheres—the original motivation 
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for his astronomical investigations. His treatise Nova stereometria doliorum 
vinariorum (New solid geometry of wine barrels), published in 1615, was 
intended to enable wine merchants to accurately gauge the volumes of 
their barrels. This work concentrates on solids of revolution, and includes 
determinations of the (exact or approximate) volumes of over ninety such 
solids. 

Kepler’s approach in the stereometria is to dissect a given solid into an 
(apparently) infinite number of infinitesimal pieces, or solid “indivisibles”’, 
of a size and shape convenient to the solution of the particular problem. 
For example, he regards the sphere as composed of an infinite number of 
pyramids, each having its vertex at the center and its base on the surface of 
the sphere, and height equal to the radius r of the sphere. Adding up the 
volumes of these pyramids, the formula for the volume of a pyramid 
immediately gives V= Ar/3=4ar>/3, where A =4mr is the surface area 
of the sphere. 


EXERCISE 2. Derive similarly the formula for the volume of a circular cone by 
considering a dissection of its base into infinitely many infinitesimal triangles. 


EXERCISE 3. Consider a cylindrical segment with nonparallel bases as in Figure 3. 
Derive the volume formula V= rh by considering it to be the sum of infinitely 
many thin vertical slices as indicated. By the formula for the area of a trapezoid the 
volume of such a slice is bht, where ¢ is its thickness. 


Kepler showed that the volume of an anchor ring or torus, generated by 
revolving a circle of radius a around a vertical axis at a distance b from its 
center, is equal to the product of the area of the circle and the distance 
traveled by its center (the theorem of Pappus). That is, 


V = (ma’)(2mb) = 227a’b. (6) 


He derived (6) by dissecting the torus into infinitely many thin vertical 
circular slices by means of planes through the axis of revolution. Each such 
slice is thinner on the inside (nearest the axis) and thicker on the outside. 
(Fig. 4). Kepler assumes the volume of such a slice is 7a’t, where t =(t, + 
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F, 


Figure 4 


t,)/2, the average of its minimum and maximum thickness. Then ¢ is the 
thickness of the slice at its center, so the volume of the torus is V= 
(1a*)\(=t) = (1a’)(27b). 


EXERCISE 4. Derive the formula za’t for the volume of a vertical circular slice of the 
torus, by considering it to be composed of narrow horizontal slices with trapezoidal 
cross-sections. 


EXERCISE 5. Give an alternative derivation of (6) as follows. Dissect the torus into 
infinitely many thin vertical cylindrical shells, corresponding to a dissection of the 
generating circle into narrow vertical rectangular strips (Fig. 5). Calculate pairwise 
the volumes of cylindrical shells corresponding to pairs of strips that are symmetri- 
cally located relative to the vertical diameter of the circle. 


An English translation of part of the stereometria may be found in 
Struik’s source book ({11], pp. 192-197). Confident that his results could be 
established rigorously if necessary, Kepler indulged in free play with 
infinitesimals to calculate the volumes of a wide variety of solids of 
revolution, saying “We could obtain absolute and in all respects perfect 
demonstrations from these books of Archimedes themselves, were we not 
repelled by the thorny reading thereof.” 
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Cavalieri’s Indivisibles 


The systematic use of infinitesimal techniques for area and volume com- 
putation was popularized by two influential books written by Bonaventura 
Cavalieri (1598-1647)—his Geometria indivisibilibus (Geometry of indivisi- 
bles) of 1635 and his Exercitationes geometricae sex (Six geometrical 
exercises) of 1647. English translations of brief but illustrative sections of 
these two lengthy works may be found in Struik’s source book ({11], 
pp. 209-219). 

Cavalieri’s methods differed in two significant ways from those of 
Kepler. Firstly, Kepler imagined a given geometrical figure to be decom- 
posed into infinitesimal figures, whose areas or volumes he then added up 
in some ad hoc way to obtain the area or volume of the given figure. 
However, Cavalieri proceeded by setting up a one-to-one correspondence 
between the indivisible elements of two given geometrical figures. If 
corresponding indivisibles of the two given figures had a certain (constant) 
ratio, he concluded that the areas or volumes of the given figures had the 
same ratio. Typically the area or volume of one of the figures was known 
in advance, so this gave the other. 

Secondly, Kepler thought of a geometrical figure as being composed of 
indivisibles of the same dimension (i1.e., infinitesimal areas or volumes), as 
might be conceived to result from some process of successive subdivision 
leading eventually to ultimate indivisible units. However, Cavalieri gener- 
ally considered a geometrical figure to be composed of an indefinitely 
large number of indivisibles of lower dimension. Thus he regarded an area 
as consisting of parallel and equidistant line segments, and a volume as 
consisting of parallel and equidistant plane sections, without making it 
entirely clear whether these indivisible units have thickness or not. Usually 
they appeared not to, but on at least one occasion he suggested that they 
might, mentioning the analogy of the parallel threads in a piece of cloth, or 
the parallel pages filling up the thickness of a book. In contrast to 
medieval speculators, he was less interested in questions as to the precise 
nature or existence of indivisibles, than in their pragmatic use as a device 
for obtaining computational results. Rigor, he wrote in the Exercitationes, 
is the affair of philosophy rather than mathematics. 

Cavalieri’s method of comparing two geometrical figures by comparing 
the indivisibles of one with the indivisibles of another is based on a 
principle that is still known as Cavalieri’s Theorem: 


If two solids have equal altitudes, and if sections made by planes parallel 
to the bases and at equal distances from them are always in a given ratio, 
then the volumes of the solids are also in this ratio. 


He attempted to prove this theorem by a superposition argument that 
involved moving one of the solids piece by piece so as to superimpose it on 
the other one. 
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Figure 6 


The practical effect of Cavalieri’s Theorem is to gloss over or “hide” the 
role of limit processes in volume computations. For example, if two 
triangular pyramids have the same altitude A and the same base area A, 
then an easy similarity argument shows that their triangular cross-sections 
at equal heights have equal areas, so Cavalieri’s Theorem implies that the 
two pyramids have equal volumes. Recall (from Chapter 1) that this result 
is the principal step in the proof of the pyramid volume formula V = Ah/3. 

To derive the formula for the volume of a circular cone C with base 
radius r and height h, we compare it with a pyramid P with height h and 
unit square base. If C,, and P,. are the sections indicated in Figure 6, then a 
similarity computation gives 

22 2 
a(C,) = os and a(P,) = me 
Thus a(C,)=ar’a(P,), so Cavalieri’ss Theorem implies that v(C)= 


ar’y(P) = ar’h /3, since v(P)=h/3. 


EXERCISE 6. Derive the formula for the volume of a sphere by comparing a 
hemisphere of radius r with the solid that is obtained from a cylinder of radius and 
height r by removing an inverted cone whose base is the top of the cylinder and 
whose vertex is the center of the base of the cylinder. 


EXERCISE 7. A spherical ring is obtained from a solid sphere by boring out a 
cylindrical hole whose axis is the vertical diameter of the sphere. Find the volume 
of the spherical ring by comparing it with a sphere whose diameter is equal to the 
height of the ring. 


EXERCISE 8. Consider the solid intersection of the unit cylinders x?+z?=1 and 
y*+z*=1 along the x- and y-axes. Find its volume by comparing it with the solid 
that is obtained from a rectangular parallelepiped with unit square base and height 
2, by removing two square pyramids having the top and bottom of the parallele- 
piped as their bases, and having a common vertex at the center of the parallele- 
piped. Compare your answer with the result Archimedes obtained using his 
mechanical method (Chapter 2). 
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Figure 7 


In addition to his technique of comparing two solids by comparing their 
cross-sections, Cavalieri devised a method of calculating the volume of a 
single solid in terms of its cross-sections. This latter method was based on 
a formal procedure for computing what may be referred to as “sums of 
powers of lines” in a triangle parallel to its base. This procedure, though 
far from ngorous, led Cavalieri to a correct result equivalent to the basic 
integral 


n+] 


[x dx =. (7) 


n+l 


For example, consider the triangle ABC with base and height a, and 
typical vertical section PQ of length x (Fig. 7). Then, in Cavalieri’s sense 
of indivisibles, the triangle is the sum of all such segments, so we might 
write 


B 
a(AABC) = > x, 
A 


a concise notation for what Cavalieri said in verbose geometrical terminol- 
ogy. If P is a pyramid with vertex A and its base being a square on BC, 
then its cross-section at a distance x from the vertex has area x7, so we 
similarly write 


v(P) = 2 x, 


thinking of the pyramid as the sum of its cross-sections. The same sum, of 
the squares of the lines in the triangle, represents also the area under the 
parabola y = x? (Fig. 8), since its typical vertical section has length x?. If OQ 
is the solid obtained by revolving the parabola around its base AB, then its 
cross-section at a distance x from the “vertex” A has area 7x*, so we write 


B 


B 
v(Q) = Siaxt = 27> x". 
A 


A 
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Figure 8 


These examples indicate the way in which a wide variety of area and 
volume problems can be solved in terms of formal sums of powers of lines 
in a triangle. 

To outline Cavalieri’s method for computing these formal sums, we start 
with a square ABCD with edgelength a, divided into two triangles by its 
diagonal AC (Fig. 9). If x and y denote the lengths of typical sections PQ 
and QR of these congruent triangles, then x + y =a, so 


B B B B B 
a= D(xty)=Dx+ Dy =2D x, 
A A A A A 
because > x = 2 y by symmetry. Hence 
= ine 1 
Dx=5Da=50' (8) 
A 24 2 


because Da represents the area of the square. 


R 
D C 
E G 
ef 
2 

A P B 


Figure 9 
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To compute > x? we start with 
B B 
Da = D(x+y)'= Dx +2D 9 + Dy? 
A A 
= 23> x7+2> xy (symmetry) 
2 
=2> x°+23> (f-2"} 
where x = (a/2) — z, y=(a/2)+z (see Fig. 9). Hence 
>a? = 43) x? - 45) 2?. (9) 
Here >z? is a sum of squares of lines in the two triangles AEF and CFG. 
But the sum of z? over one of these triangles represents the volume of a 


pyramid with dimensions one-half those of the pyramid whose volume is 
24 x7. Therefore 


Substitution of this result into (9) gives 


2 es 1 
Sxvr=2 ha’ ==—a', (10) 
A 3 A 3 


because Da? represents the volume of a cube with edge a. 
Proceeding to sums of cubes of lines, we have 


VP=Vixtyl = De +I LVxyt+3 Dw? + DY 
Sa =2> x37 +6>) xy (symmetry). (11) 
To evaluate >.x?y we resort to the following trick. 
 e=ad a= a(2> x*+2> xy) 
a( 5 dS a*+2> 9] 
= FDP +22 (x+y)9 


a= : 3 a2+43'x% (symmetry). 


Therefore >x7y = Da?/12, and substitution of this result into (11) gives 


B l 
Sa =—a’*. (12) 
"A 4 
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Formulas (8), (10) and (12) are the first three instances of the general 
formula 


B qnt! 
2x" ~ A+]? (13) 


equivalent to (7), which Cavalieri inferred after verifying it case by case up 
to n=9. On the basis of this result, he could immediately write down the 
area under the curve y = x” (n a positive integer) over the unit interval, 


at Dx" — 


and the volume of the solid obtained by revolving this area around the 
X-axis, 


Ly e 
ae ~ Int1° 


This unification and generalization of previous results constituted a giant 
step towards the development of the algorithmic procedures of the calcu- 
lus. 


EXERCISE 9. Expand Da*=X(x + y)* to derive the result 


EXERCISE 10. Consider the area in the xy-plane bounded by the line y = 1 and the 
parabola y = x”. Let P be the solid obtained by revolving this area around the line 
y=1. Considering P as the sum of its circular cross-sections, apply Cavalieri’s 
results to obtain v(P) = 167/15. 


Arithmetical Quadratures 


We saw in Chapter 2 that Archimedes used the formulas for sums of 
integers and their squares, 


[+24+--- n= F(n4tl) (14) 
and 
7+27+--- +n? = (n+ 1)(2n+1), (15) 


to establish quadrature results equivalent to the integrals 


(4 _ a? ¢ 4 _@ 
[xax=5 and [ede = >. 
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Actually, all that is needed for these two quadratures are the immediate 
consequences 


i 1+2+--: +n _ 1 
pea n2 ey 
and 
_ 174+274+--- +n? 1 
lim ————___——_- = — 
n—-0o n> 3 


of formulas (14) and (15). 

During the two decades following the publication of Cavalieri’s first 
book in 1635, the French mathematicians Fermat, Pascal, and Roberval 
gave more or less rigorous proofs of Cavalieri’s (conjectured) general 
formula 


tes dx = —— (16) 


for the area under the generalized parabola y = x* (k a positive integer). 
Each of their proofs made use of the limit 
_ K4+2*4+--- +n* l 
in et eT? a) 
involving the sum of the kth powers of the first n positive integers, to 
replace Cavalieri’s intuitive arguments in terms of geometrical indivisibles 
with explicit arithmetical computations. 

To see how it is that the arithmetical limit (17) implies the area formula 
(16), we subdivide the interval [0, a] into n equal subintervals of length 
a/n, and construct the usual inscribed and circumscribed polygons P,, and 
Q, (Fig. 10). P, consists of rectangles with base a/n and heights 
(a/n)*, (2a/n)*,...,((n—1)a/n)*, and Q, consists of rectangles with 


Figure 10 


Arithmetical Quadratures 111 


base a/n and heights (a/n)*, (2a/n)*,..., (na/n)*. Adding up the areas 
of these rectangles, we find that 


gk?) ‘ 
a(P,) = = (VP +2*¥ +--+ +(n-1)") 
n 
and 
gkt} 
a(Q,) = a7 (I* + 2% + + - +n°*), 
n 


Denoting by S the region under the curve y = x* over the interval [0, a], 
we see that 
pal Pes 2a Ke .-- +nk 


A a es Rte ne k+1 
a n*t) n < a(S) <a nkt+1 


Taking limits as n—>00, it now follows from (17) that a(S) = a**!/(k +1) 
as desired. 


EXERCISE 11. Show that (16) also follows from the inequality 


nktl 


k+1 


hh+2k+---4+(n-l)' < < hee es tak, 
Fermat derived formulas for >"_, i* from a theorem concerning figurate 
numbers. The nth triangular number (the first type of figurate numbers) is 


1+2+-:-:- +n =>(n+1). 


The nth pyramidal number is the sum of the first n triangular numbers. In 
general, the mth figurate number of type k is the sum of the first n figurate 
numbers of type k—1. In a letter written in 1636 (see Mahoney [9], 
pp. 229-232), Fermat stated without proof that the nth figurate number of 
type k is given by 


“ iGitl)- ++ Gitk-1 +1)---(nt+k 
3 DD mney 


EXERCISE 12. Regarding k as a fixed positive integer, prove formula (18) by 
induction on n. 


Let us write 
(itl): ++ (itk—-1) =i*+ai* +--+ +a,_,1, (19) 
where the coefficients a,,...,a,_, are constants (depending, however, 


upon k). Then (18) becomes 


l - sk = -k—-1 “ Ss n(n+1)- oie (n+k) 
kr a? nies +-*- + Apia kt Dt 
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from which we can solve for the recursion formula 


a n(nt+1)---(n+k ee Aes 
> = Met et) _ Aas i 41 2 (20) 


i=] 
which gives the sum of the kth powers of the first m integers in terms of the 
sums of lower powers. 


EXERCISE 13. Starting with D7 i= n(n+ 1)/2, apply the recursion formula (20) to 
compute 27 i7, 277°, and 7 i*. You will have to compute the a,’s from (19), 
separately for each value of k. 


EXERCISE 14. Apply (20) to prove by induction on k that 
n k+l 


ok 
De aaa aay 


i=l 
Note that this fact suffices to establish the limit in (17). 


+ lower powers of n. 


In 1654 Blaise Pascal discovered the following more explicit recursion 
formula for sums of kth powers: 


ae ok He ee was 
+ +--+ 4+ 
ras Pusaren > ) 2! 

= (n+1)**'—n-1 (21) 
where (2) =p!/q!(p — q)! is the usual binomial coefficient. See the article 
of Boyer ((5], p. 239) for Pascal’s rhetorical statement of this formula, 


which he deduced by incomplete induction from number relationships in 
the “Pascal triangle’. 


EXERCISE 15. Apply the binomial formula to establish formula (21), starting with 
the following trick. 


(n+1)**'-1 


S [G+ 874 


i=] 


£1) 3(43)| 


i=l] p=0 


EXERCISE 16. Starting with >7 i= n(n + 1)/2, apply the recursion formula (21) to 
compute 37 i?, D7 i>, and 7 i*. 


EXERCISE 17. Use the binomial formula to expand (n + 1)*t! on the right-hand side 
of (21), and then prove by induction on k that 


n k+1 k 


je OF a 
yi rat + lower powers of 7. 
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EXERCISE 18. Deduce from the previous exercise that, if 7 is sufficiently large, then 
nktl 
k+1 


Kh+2+---4(n-1* < < K+ aee-- + tnt, 


After stating formula (21) in verbal form, Pascal went on to remark that 


Any person at all familiar with the doctrine of indivisibles will perceive 
the results that one can draw from the above for the determination of 
curvilinear areas. Nothing is easier, in fact, than to obtain immediately 
the quadratures of all the types of parabolas and the measures of 
numberless other magnitudes. 

If then we extend to continuous quantities the results found for num- 


bers, we will be able to state the following rules: ... The sum of a certain 
number of lines is to the square of the largest as 1 is to 2. The sum of the 
squares is to the cube of the largest as 1 is to 3. ... The sum of like 


powers of a certain number of lines is to the power of the next higher 
degree of the greatest of these as unity is to the exponent of this latter 
power (quoted from Boyer [5], p. 240). 


Pascal’s idea here appears to be that, when n is very large, the lower 
powers of n are negligible in comparison with the first term n**!/(k + 1) 
in the formula for >? i*. Therefore, when the area under the curve y = x* 
over [0, a] is subdivided into a very large number 7 of (almost) rectangular 
strips of width w=a/n, it seems apparent that the area under the curve is 


[wk +(2w)<+--- + (nw)*|w ey ae 
i=l 
(nw)**! gk! 
Sor ea 


This is essentially an abbreviation of our earlier derivation of this result by 
the method of exhaustion. 


The Integration of Fractional Powers 


The quadrature of curves of the form y=x*, with k not necessarily a 
positive integer, was first attacked systematically by John Wallis 
(1616-1703) who was the Savilian professor of geometry at Oxford. In fact, 
rational and negative exponents were introduced by Wallis in his 
Arithmetica Infinitorum (The Arithmetic of Infinites) of 1655, which (as we 
will see in Chapter 7) had a decisive influence on Newton’s early mathe- 
matical development. 

On the basis of computations with arithmetical indivisibles similar to 
those described in the previous section, Wallis knew that the area under 
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the curve y = x* (k a positive integer) over the unit interval is given by 


OK+1*+--- +n* 
fox dx = lim 
0 no nk t+ nka... +n* 


His approach to the determination of this limit was empirical. For exam- 
ple, in the case k =3 he noted that 


l 
Pep 40474) 
O+1+2 9 1,1. 
23423423 24 4 8° 
Or+ +2? +3? _ 36 1 l 
743°43°4+3 108 4 12’ 
Oe eee 100)... 
4+4+44+454+43 320 4 16° 
O+1P+---4+5° 225 1~— «1 


34574... 45? 750 4° 20° 
O+P+--- +6 441 ~=#1~ 1 


64+6>+--- +6 1512 ~ 4 24° 
On the basis of this numerical evidence he concluded that 


O+2+--- +n 1 1 
ee eS 


m+nt--- tn? 4 4n’ 
so the limit as n—>00 is ¢. After carrying out such computations for several 
small values of k he inferred (without further proof) that 
ae eee k 

lim OA ean ee (22) 

no ynka yk oo. 4 yk k+1 
for all non-negative integral values of k. 

In order to describe his next step, let us define the index I{} of a 

function ¢ by the equation 


(0) +9(I) +--+) 
asm o(n)+o(n)+--- +4(n) To} +1’ (23) 


assuming the limit exists. Then equation (22) simply says that the index of 
o(x)=x* is 1{x*} =k. Wallis then noted that, given a geometric progres- 
sion of positive integral powers of x (such as 1, x”, x*, x®), the correspond- 
ing sequence of indexes is an arithmetic progression (0, 2, 4, 6). From this 
trivial observation he leapt boldly to the assumption that the same conclu- 
sion would follow for a geometric progression such as 


1, Ve, (Vx, -++, (VE)™, 
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That is, the sequence of indexes 
S71), FVe).. 124 Vay. 2444, Tepe 
should be an arithmetic progression, so it would follow that 
1{ (Vx y} = a (p and q integers) 


and hence from (23) that 


mn QOFH(MTY +--+ (af_ tg 

in ————_— 
"(Wn P+ (Wn Pt --- +(ny (p/q)+1 PpPt+qd 

It was on these highly speculative grounds that Wallis associated the index 


or exponent p/q with the power (Wx y?, leading to the now standard 


notation (Wx y? = x?/9, He also introduced irrational exponents, asserting 
that “If we suppose the index irrational, say V3 , then the ratio is as 1 to 
1+ V3, etc.” 


EXERCISE 19. Show by exhaustion or indivisibles that the area under the curve 
y = x?/4 over the unit interval is equal to the limit in Equation (24). 


Wallis was able to verify (24) only for the special case p = 1. In this case 
it follows from Figure 11 and Exercise 19 that 


ly 1 

po ddx + [ x%dx =| 
0 0 

Ye) 


e/a I l 
a Ne ee ee 
fi? ” qt+1  (1/q)+1 


as desired. 


1 


Figure 11 
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Xn+1 Xn X1 A=XQ X 


Figure 12 


See the article by Nunn [10] for an English paraphrase of the Arithmetica 
Infinitorum. Struik’s source book also contains a translation of part of this 
work ({11], pp. 244-247). Wallis’ conjecture that 

[xd = Eevee — —4_ ,0+9/4 (25) 
0 (p/q)+1 pt 
if p/q is a positive rational number was established by Fermat and by 
Evangelista Torricelli (1608-1647), who was a disciple of Galileo and 
Cavalieri. Although the investigations of Fermat and Torricelli predated 
that of Wallis—for example, see Mahoney’s account of Fermat’s work ((9], 
pp. 243-—252)—they were not published until somewhat later. 

Fermat began by subdividing the interval [0, a] into an infinite sequence 
of subintervals with endpoints {x,}§°, where x, =ar” and O<r<l. Ifa 
rectangle with height x?/? is erected on the mth subinterval [x,,,, x,] (see 
Fig. 12), then the sum of the areas of this sequence of rectangles is 


ora) 
A(r) = >) XE (x 44) 
n=O 


(ar” Par" és ar"*1) 
0 


ic,2] 


n= 
co 
= a? +D/I| — r) >> pner+a)/4 


n=0 


co 
= qge+9/4(] — r) > 5” (s = e+ 9/7) 
n=0 


= gern/qi at 
l=s 
= gero/qAT (t=r'!/7) 
}—7?*? 
I+e+--- +2497! 
A(r) = a?*9/4 (26) 


P+et +) pyeetenl’ 
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since (1—*)(1+¢t+--- +2*~')=1-1*. Now the area under the curve is 
evidently the limit of A(r) as r approaches 1 (and hence t—>1), so equation 
(25) follows by taking the limit in (26). 


EXERCISE 20. Show similarly that the area under the generalized hyperbola y = 
x~?/4 (p/q>1) over the semi-infinite interval [a, 00) is given by 


[ox dx = Gq g-@-9/4. 
a 


Subdivide the base [a, oo) into an infinite sequence of subintervals with endpoints 
{x,}o, where x= ar” and nowr>1. 


EXERCISE 21. (a) Consider the generalized hyperbola y=x~?/%, where p/q is a 
positive rational number not equal to one, and the areas A,, Az, A3, Aq, As indi- 
cated in Figure 13. Torricelli showed by an exhaustion proof that 


——_—_—_——_ = P * 
A,+Ag @q © 
Use calculus to verify (*) by computing the integrals 


A, + Ay = jee dx 
a 


and 


a7P/4q 


A, + Aq =f y—4/P dy. 


b-P/4 


(b) Derive the integral 


—(pP-9)/9 — g-@-D/49 
[Pxrel4 ax =A,+A,= alesse 
a (q—-P)/4q 


from (*) and the obvious fact that 


A, + A, + A3+ Ag+ As = ba~?/4, 


Figure 13 
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The First Rectification of a Curve 


We have seen that the quadrature of certain curvilinear figures such as 
segments of parabolas dates back to ancient times. However, it was long 
thought that a segment of an algebraic curve could never have the same 
length as a constructible straight line segment. That is, the rectification 
problem—of constructing a straight line segment equal in length to a given 
curve—was thought to be impossible for algebraic curves. But in the late 
1650s infinitesimal techniques were applied to show that this pessimism 
had been unjustified. 

The first rectification of a curve was that of the “semi-cubical parabola” 
y?=x? in 1657 by the Englishman William Neil (who was then twenty 
years old, and apparently was never heard from again). To describe his 
procedure for computing the length of the segment of this curve (Fig. 14) 
that lies over the interval 0<x <a, we subdivide this interval into an 
indefinitely large number n of infinitesimal subintervals, the ith one being 
[x;_,, x;]. If s, denotes the length of the (almost straight) piece of the curve 


y = x?/? joining the corresponding points (x,_,, y,_,) and (x,, y,) then 
1/2 
5; =| (x; = x;,) + (y; ~y;-1)' | : , (27) 
so the length of the curve is given by 
7 1/2 
s2 > [ (x; — x1) +(;-¥;-1)' | (28) 


i=] 


In order to compute the sum in (28), Neil introduced as an auxiliary 
curve the parabola z= x!/? (Fig. 15). If A, denotes the area under this 
parabola over the interval [0, x,], then we know from the general quadra- 
ture result (25) that A, =2x;/*/3. Therefore we obtain 


Vi Vj FX XY 


(29) 


Figure 14 
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Figure 15 


by approximating the strip of area over [x,_,, x;] with a rectangle of height 
z,= x}/*. Substitution of (29) into (28) then gives 


satay. 2]1/2 
t+(2%et) (x,—x,_,) 


“3 4\'/2 
ss 5("+5] (X75): (30) 


At this point we recognize the sum in (30) as that which gives (in the 
limit) the area of the segment of the parabola y =(3/2)(x + 4/9)'/” lying 
over the interval [0, a]. By translation this is the same as the area of the 
segment of the parabola y =3x!/?/2 lying over the interval [4/9, a+ 4/9] 
(see Fig. 16). From the general quadrature result (25) we therefore obtain 


s = 3[2(a+4/9)/? -2(4/9)°?] 


: (9a+4)/*-8 
27 


YT y= Be + $2 


—3 a x 


Figure 16 
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It is instructive to phrase Neil’s procedure in more general terms. In 
order to calculate the length of the curve y = f(x) over [0, a], we first need 
an auxiliary curve z = g(x) for which the area A, over [0, x,] is 


A, = [ 'a(x)dx = f(x) = y, (31) 
0 


Then it follows that 
Yer Via = Ay A,_,; = 8(x;)(%;- X;—)s 


o “characteristic triangles” give 


sD [1+(s(0)]!7,- 5) 


s= f'yl+[e)] 


Thinking of the fundamental theorem of calculus, we see that the proper 
choice of the auxiliary curve (so as to give (31)) is g(x)=/’(x). Thus a 
combination of quadratures and tangents via the characteristic triangle is 
implicit in Neil’s construction for the particular case f(x) = x?/?. 


EXERCISE 22. Apply this method to show that the rectification of a parabola is 
equivalent to the quadrature of a hyperbola. In particular, the length s of y = x? 
over (0, a] is given by 


s= [OVI+4x? dx. 
) 


Note: Prof. Jon V. Pepper has pointed out to me that, a half century or more 
before the time of Neil, Thomas Harriot had computed the length of an 
equiangular spiral, albeit by a limit of sums method rather than by an 
Inverse tangent approach. See his excellent study of Harriot’s work in 
“Harriot’s calculation of the meridional parts as logarithmic tangents”, Arch 
Hist Exact Sci 4, 359-413, 1968. 


Summary 


During the middle decades of the seventeenth century infinitesimal tech- 
niques or indivisibles, motivated by attempts to relax the rigor of the 
classical method of exhaustion, were applied to establish the basic quadra- 
ture result 


a. akt! 
| Ao 


It was this result itself, rather than the particular methods used to derive it, 
that was of lasting importance. For by 1660 the early direct methods of 
quadrature were rapidly approaching obsolescence, soon to be superseded 
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by indirect methods based on the interplay between quadrature and 
tangent methods. Neil’s rectification was, at least implicitly, an early (and 
perhaps the first) example of this interplay between the two distinct aspects 
of the emerging calculus. 


References 


[1] E. J. Aiton, Kepler’s second law of planetary motion. Jsis 60, 75-90, 1969. 
[2] M. E. Baron, The Origins of the Infinitesimal Calculus. Oxford: Pergamon, 
1969, Chapters 3-6. 
[3] C. B. Boyer, The History of the Calculus. New York: Dover, 1959, Chapter 4. 
[4] C. B. Boyer, Cavalieri, limits and discarded infinitesimals. Scr Math 8, 79-91, 
1941. 
[5] C. B. Boyer, Pascal’s formula for the sums of the powers of the integers. Scr 
Math 9, 237-244, 1943. 
[6] S. A. Christensen, The first determination of the length of a curve. Bibl Math 
N.S. I, 76-80, 1887. 
[7] J. L. E. Dreyer, A History of Astronomy from Thales to Kepler. New York: 
Dover, 1953. 
[8] A. Koestler, The Sleepwalkers. New York: Macmillan, 1968. 
[9] M. S. Mahoney, The Mathematical Career of Pierre de Fermat. Princeton, NJ: 
Princeton University Press, 1973, Chapter 5. 
{10} T. P. Nunn, The arithmetic of infinites. Math Gaz 5, 345-356, 1909-1911. 
{11} D. J. Struik, A Source Book in Mathematics, 1200-1800. Cambridge, MA: 
Harvard University Press, 1969. 
[12] D. T. Whiteside, Patterns of mathematical thought in the later 17th century. 
Arch Hist Exact Sci 1, 179-388, 1960-1962. 


Early Tangent Constructions 


Introduction 


In modern calculus courses the treatment of differentiation and the con- 
struction of tangent lines to curves usually precede the treatment of 
integration and the calculation of areas under curves. This is a reversal of 
the historical sequence of discovery; as we have seen in the preceding 
chapters, the calculation of curvilinear areas dates back to ancient times. 
However, apart from simple constructions of tangent lines to conic sec- 
tions (with the static Greek view of a tangent line as a line touching the 
curve in only one point), and the isolated example of Archimedes’ con- 
struction of the tangent to his spiral, tangent lines were not studied until 
the middle decades of the seventeenth century. 

Then, beginning about 1635, a number of different methods for the 
construction of tangent lines to general curves were rapidly discovered and 
investigated. It was the combination of these new tangent methods with 
area problems and techniques, during the last third of the seventeenth 
century, that produced the calculus as a new unified method of mathemati- 
cal analysis. 


Fermat’s Pseudo-equality Methods 


Fermat was the first to solve maximum-minimum problems by somehow 
taking into account the characteristic behavior of a function near its 
extreme values. For example, in order to determine how to subdivide a 
segment of length b into two segments x and b—x whose product 
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x(b — x) = bx — x? is maximal (that is, to find the rectangle with perimeter 
2b that has maximal area), he proceeded as follows. First he substituted 
x+e (he used A, E instead of x, e) for the unknown x, and then wrote 
down the following “pseudo-equality” to compare the resulting expression 
with the original one: 


b(x +e) —(xt+e) = bx + be — x? — 2xe — e? ~ bx — x?. 
After cancelling equal terms, he divided through by e to obtain 


2x+e~ b. 


Finally he discarded the remaining term containing e, transforming the 
pseudo-equality into the true equality 


ae 
2 


that gives the value of x which makes bx — x? maximal. 

Unfortunately, Fermat never explained the logical basis for this method 
with sufficient clarity or completeness to prevent disagreements between 
historical scholars as to precisely what he meant or intended. Two recent 
and contrasting views may be found in the book by Mahoney ({5], Chapter 
4) and the article by Strgmholm [6]. 

An explanation, one that perhaps is closer to modern perceptions than 
those of Fermat, might be given as follows. If f(x) is a maximum (or 
minimum) value of the function f, then it seems on intuitive or pictorial 
grounds that the value of f changes very slowly near x (see Figure 1). 
Hence, if e 1s quite small, then f(x) and f(x + e) are approximately equal, 


f(x +e) ~ f(x), 
f(x +e) — f(x) ~ 0. 
If f(x) is a polynomial, then f(x + e) — f(x) will be divisible by e, so we 


carry out this division, obtaiming 


f(x+)-f@) 


x 


f(x +e) ~ f(x) 
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x=xt+ee=0 


Figure 2 


But the limit of this quotient as e—>0 is the modern definition of the 
derivative. Consequently Fermat’s suppression of the remaining terms that 
involve e amounts to writing f’(x) = 0. 

However, it must be emphasized that Fermat did not explicitly require 
that e be “small”, and said nothing at all about taking the /Jimit as e 
approaches 0. On at least one occasion, he treated x and x + e in a purely 
algebraic manner as distinct roots of the equation f(x)=c (see Figure 2). 
Writing f(x + e)=f(x), he cancelled equal terms, divided through by e, 
and finally discarded the remaining terms involving e, on account of the 
fact that the two roots are equal (so e=0) when c=f(x) is the maximum 
value of f. 


EXERCISE 1. If f(x) = a,x‘ is a polynomial, verify that f(x + e)— f(x) is divisible 
by e. 


EXERCISE 2. Apply Fermat’s method formally to find the maximum value of 
f(x) = bx? — x3 forO<x <b. 


Fermat used a similar “pseudo-equality” technique to construct tangent 
lines. From the similar triangles in Figure 3, we read off the proportion 


ste k 


s f(x) 


Upon substituting k~ f(x + e), we solve for the sub-tangent s, 


of (x) 
TG eee) o 


If we cancel the e in the numerator into the f(x+e)—/(x) in the 
denominator (assuming f is a polynomial so Exercise 1 applies), and 
discard the remaining terms in the denominator that involve e, we then 
obtain an expression for the sub-tangent. In modern terms, this corre- 
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Figure 3 


sponds to writing 


ee) 
[ f(x + e) —f(x) |/e 
and then taking the limit as e—»0 to obtain 
F(x) 
= t = (2 
F(x) 
Since the slope of the tangent line is f(x)/s, Equation (2) identifies the 


slope of the tangent line to the curve y = f(x) with the derivative f{’(x). 
For example, with f(x) = x”, Equation (1) gives 


A) 


, ex? ee 
(x+e)’—x? 2xte 


Suppression of the remaining e yields s= x /2, so the slope of the tangent 
line to the parabola y = x? is 


EXERCISE 3. Apply the above method of Fermat to show that the sub-tangent to 
y=x" is s=x/n, so the slope of the tangent line is nx”"~!. 


Descartes’ Circle Method 


Descartes devised a method of constructing tangent lines that was alge- 
braic rather than infinitesimal in character. Although Fermat’s approach 
struck closer to the infinitesimal heart of the matter, Descartes’ algebraic 
approach probably exerted a greater influence on the immediate develop- 
ment of the calculus. Descartes’ appreciation of the importance of the 
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X ee’ UD 


Figure 4 


problem of constructing tangent lines was expressed in his statement that it 
is not only “the most useful and general problem that I know but even that 
I have ever desired to know in geometry.” 

His method of finding the tangent line to the curve y = f(x) at the point 
P(x, f(x)) involved first locating the point C(v, 0) of intersection with the 
x-axis of the normal line to the curve at P. The tangent line can then be 
taken as the perpendicular through P to the normal line. 

In general, a circle with center C(v, 0) and radius r= CP will intersect 
the curve y = f(x) in a second point near P (see Fig. 4). If, however, CP is 
the normal to the curve at P, then the point P should be a “double point” 
of intersection of the curve y=f(x) and the circle y?+(x—v) =r’. 
Assuming that [f(x)]’ is a polynomial, this means that the equation 


[f(x] + (v-xy = 7 (3) 


(with v and r fixed) will have the coordinate x of P as a double root. 

Now a polynomial which has a double root, say x =e, must be of the 
form (x —e)*>c,x'. Descartes imposed the condition that Equation (3) 
have a double root by writing 


[ f(x) ]? + (v— xy — 7 = (x-e)' Sex! (4) 


By equating like powers of x, he then solved for v in terms of the root 
e= x. The slope of the tangent line at P is then (v — x)/f(x) (the negative 
reciprocal of the slope — f(x)/(v — x) of the normal CP in Figure 4). 

For example, consider the parabola y?= kx, or y=f(x)= Vkx . Then 
Equation (3) is 


kx +(v—x)—r? =0. 


This is a 2nd degree equation, so the right-hand side of (4) should be a 
polynomial of degree 2, hence 


kx +(v—x) — r? = (x-ey. 
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Equating coefficients of x gives k-—2v= —2e, or v=e+ $k. Substituting 
e=x, the subnormal v — x is +k, and the slope of the tangent line to the 
parabola at (x, Vkx ) is 


o-x_ kf/2 _1. fk 
7@) Vix 2Vx~ 


For the parabola y = x”, we write Equation (4) in the form 


x4 + (v—x) — r? = (x—e)(x?+ ax +b), 


since the left-hand side is a 4th degree polynomial with leading coefficient 
1. Expansion gives 


x4 + x2 —2ox + (v?— 7?) = x4 + (a—2e)x? + (b—2ae + €*)x? 
+ (ae? —2be)x + be?. 


Equating coefficients gives the equations 


a—2e=0 
b—2ae+e?=1 
ae* —2be = —2v 


which we solve for v = 2e? + e. Substituting e = x, the subnormal is v — x = 
2x3, and the slope of the tangent line to the parabola at the point (x, x7) is 
= 3 
Oe = 2x. 


f(x) 7 x? 


EXERCISE 4. If y = x?/?, apply Descartes’ method to show that the subnormal is 
v— x =3x7/2 and the slope of the tangent line is 3x!/?/2. 


The Rules of Hudde and Sluse 


As may be guessed from the examples in the preceding section, the direct 
application of Descartes’ circle method to any but the very simplest curves 
leads to prohibitively tedious algebraic computations. However, formal 
algorithms for the constructions of tangents were discovered in the 1650s 
by the Dutch mathematicians Johann Hudde and René Francois de Sluse 
Their mechanical rules made possible the routine computation of the 
slopes of tangent lines to arbitrary algebraic curves. | 
Hudde’s rule provides a convenient means of determining the double 
roots that Descartes’ circle method calls for. Given a polynomial 


n 
F(x) 7 > a;x', 
i=0 
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a second polynomial F*(x) is constructed as follows. The terms of F(x), 
arranged in order of increasing degree, are multiplied in turn by the terms 
of an arbitrary arithmetic progression 


a,at+b,at+2b,...,a+ nb. (5) 
The resulting polynomial is 
F*(x) = > a(atib)x'. (6) 
i=0 

Note that if a=0, b=1, so the term in x‘ is multiplied by i, then 
F*(x) => ia,x' = xF’(x) where F’(x) => ia,x'—', the now-familiar deriva- 

tive of the polynomial F(x). In general, 
F*(x) = aF(x) + bxF’(x). (7) 


Hudde’s rule states that any double root of F(x)=0 must be a root of 
F*(x)=0. This algebraic fact can be established easily as follows. If e is a 
double root of F(x), then we can write 


F(x) = (x-—e)Y Siex! = Dic(x'*?—2ex!*! + ex’). 
If A,=a+t bi, then 


F*(x) = >) ¢(A,42%'*? —20eA,, xt! + €7A,x") 
= J of (4, + 26)x? — 20(A, + b)x + 74, }x' 
= dc A(x — e)’ + 2bx(x — e) |x’, 


whence it is clear that e is a root of F*(x). 

In particular, any double root of the polynomial F(x) must be a root of 
its derivative F’(x). It was, in fact, this appearance of F’(x), in the wholly 
algebraic (not infinitesimal) context of Hudde’s rule, that first brought out 
the computational importance of what we now call the derivative of a 
polynomial. 

For example, to apply Hudde’s rule to the parabola y*= kx, we start 
with the Cartesian circle condition 


F(x) = kx +(v-xy-—r? =0 
as before. Taking a=0, b=1 so F*(x) = xF'(x), Hudde’s rule states that x 
is a double root of F(x) only if it is a root of 
F*(x) = (1)kx + (0)v? — (1)2vx + 2(x?) — (0)r? = 0, 
rx — 2vx + 2x? = 0, 
from which we solve for the subnormal v — x = $k. Hence the slope of the 
tangent line is (v — x)/Vkx =i Vk/x as before. 
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The computation of the slope of the tangent line to y=x” by direct 
application of Descartes’ circle method would be extremely tedious if 
n> 2. The circle condition is 


F(x) = x?" +(v-—x)y-— 7? =0, 
SO 
F*(x) = (2n)x?" + (0)v? — (1)20x + (2)x? — (0)r? = 0, 
or 
2nx*" — 20x + 2x? = 0. 


We immediately solve this equation for the subnormal v— x =nx?"—', so 


the slope of the tangent line is 


— = nx ; 
x 

the familiar derivative of x”. Note that this computation remains valid if n~ 

is half of a positive integer. 


EXERCISE 5. Apply the Cartesian circle method using Hudde’s rule to show that the 
slope of the tangent line to y =(x?+ 1)?” is 3x(x?+ 1)'/. 


Note that Hudde’s rule for applying the Cartesian circle condition 
amounts (in the language of derivatives) to the following. Write 


F(x) =[f(x) + (v—x)’- 7 =0, 


and then solve the equation F’(x)=0 for x in terms of v. Thus, thinking of 
a fixed point C(v, 0) on the x-axis, we are finding x such that the distance 
from C to the point P(x, f(x)) of the curve y = f(x) is extremal. 

Hudde’s rule can also be applied directly to maximum-minimum prob- 
lems. Recall that Fermat started with the observation that the maximum 
(or minimum) value M of f(x) occurs at a double root of the equation 
f(x)=M or 


F(x) = f(x) - M=0, 


and hence at a root of F*(x)=0. From Equation (5) we therefore see that 
a (local) maximum or minimum of f(x) occurs at a root of the equation 
f(x) =0 Gust as we learn in elementary calculus). 

The combination of Hudde’s rule and the Cartesian circle method 
applied only to algebraic curves that could be described in explicit form, 
y = f(x). However, Sluse stated an even more mechanical rule that applied 
equally well (and even more easily) to algebraic curves described in implicit 
form, f(x, y) =0, where 


f(x,y) => CX? 
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is a polynomial in x and y. Given the arithmetic progression (5), define 
SR(x,y) = > (at bie, ,xy/ and ff = D (at bse, x7. (8) 


Then Sluse’s rule states that the slope m of the tangent line at a point (x, y) 
on the curve f(x, y)=0 is given by 


make. (9) 


bas 


EXERCISE 6. Apply Sluse’s rule to y => cx! to obtain m=> ic,x'~'. Thus Sluse’s 
rule provides a completely algorithmic approach to derivatives of polynomials. 


EXERCISE 7. (a) Show that 
K(x, vy) = af(x,y) + bxf,(x,y) and f(x,y) = af(x,y) + byf,(x, y) 


where f, and f, denote the partial derivatives 


B= Dig! and 2 = Dic 

(b) Conclude that Sluse’s rule (9) is equivalent to the familiar result dy /dx = 

— (df/dx)/(df/dy) that is obtained by using the chain rule to differentiate f(x, y) = 
0 with respect to x, 


Ff, FDL 
i ope 


As an application of Sluse’s rule, consider the folium of Descartes, 


f(x,y) = P+ y? -—3xy = 0 


(which, as a matter of fact, was originally proposed by Descartes as a 
challenge for Fermat to find its tangent line). Taking a=0, b =1 in (8) and 
(9), we immediately obtain 


3x3—3xy  y—x? 


m= i 
xX 3y°-—3xy yr-x 

EXERCISE 8. Write down the slope m given by Sluse’s rule for the folium of 

Descartes, but taking a=2, b=1 in (8). Reconcile this answer and the one 

obtained above. 


Sluse’s rule was published in the 1673 Philosophical Transactions without 
explanation as to how he had discovered it some years earlier. If it was not 
deduced merely by inference from particular examples, a plausible possi- 
bility is that it was derived from Hudde’s rule. An obvious connection 
between the two rules is suggested by their verbal statements in terms of 
arithmetic progressions. 
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From an algebraic point of view, a tangent line to the curve f(x, y)= = 
De ,x'y/=0 is a straight line y=mx+k that intersects the curve in a 
double point. That is, the polynomial 


F(x) = f(x, mx+k) =0 
should have a double root. But 
mx 


F*(x) Sp ay (10) 


so Hudde’s double root condition F*(x)=0 immediately implies Sluse’s 
rule (9). 

By the linearity of the operations that produce F*, f*, f* from F, f, it 
suffices to establish (10) in the special case f(x, y) = x'y/. Then 


- j ; ° . 
F(x) = x'(mx +k)’ = > (J |mexi*rk i» 
p=0\P 
by the binomial formula, where (4) =j!/p!(j — p)!. Therefore, with a=0, 
b = 1, we obtain 


J 
> (J J me (i + p)xitPKi-P 


p=0 


= ix' BS (J Jimi? + maxi! S > (5 mr IxP—IkI-P 


p=0 p=1 


F*(x) 


J —_ 
= ix'(mx+ky + mxi*! > i(? | metal? 
ne 


p=!i 
got 
= ix'y/ + mjx't! S (mxpks-)? 
p=0 


= ix'yi + mjix'*"(mx+k)?"! 
= ixyl+ mjxitl7"! 
F(x) = f+ $f 
as desired. We have used the fact that 
(2) = Pu i=) - (7 
P!\-p)! (p-))!Gi-p)! p-l 


Whatever may have been the means by which Sluse’s rule was first 
discovered, the principal significance of the rules of Sluse and Hudde lay 
in the fact that they provided general algorithms by which tangents to 
algebraic curves could be constructed in a routine manner. It was no 
longer necessary to resort to special devices adapted to particular curves, 
nor to give in every case a complete demonstration of the process. For 


132 Early Tangent Constructions 


these reasons, the rules of Sluse and Hudde were perhaps the first methods 
to exhibit fully the algorithmic approach that is a distinctive feature of the 
calculus. 


Infinitesimal Tangent Methods 


The introduction in the 1650s of the algebraic rules of Hudde and Sluse 
was soon followed by infinitesimal derivations of these and similar 
methods. These newer derivations and methods owed more to the ideas of 
Fermat than those of Descartes, and involved the concept of a tangent line 
at the point P of a curve as the limiting position of a secant line PQ as Q 
approaches P along the curve. 

One such method was described by Isaac Barrow (1630-1677) in his 
Geometrical Lectures that were published in 1670 but delivered at Cam- 
bridge in the mid 1660s (and probably attended in 1664-65 by one Isaac 
Newton). Barrow was appointed in 1663 as the first Lucasian Professor of 
Mathematics at Cambridge, and resigned this chair in 1669 in favor of 
Newton (and perhaps also to qualify for administrative advancement). 

The bulk of Barrow’s published lectures treat tangent and quadrature 
problems from a somewhat classical and geometrical rather than analytical 
point of view. For example, he generally adopts the Greek definition of a 
tangent line to a curve as a straight line that touches the curve at a single 
point. However, at the close of Lecture X, he writes, 


We have now finished in some fashion the first part, as we declared, of 
our subject. Supplementary to this we add, in the form of appendices, a 
method for finding tangents by calculation frequently used by us. 
Although I hardly know, after so many well-known and well-worn 
methods of the kind above, whether there is any advantage in doing so. 
Yet I do so on the advice of a friend [who turned out to be Newton]: and 
all the more willingly, because it seems to be more profitable and general 
than those which I have discussed ({2], p. 119). 


He proceeds to describe what is apparently his own modification of a 
method that Fermat had devised (but not published) to construct tangent 
lines to a curve defined implicitly by f(x, y)=0 (for discussions of 
Fermat’s method, see the articles by Coolidge [3], pp. 452-453 and Jen- 
sen [4]). Considering an “indefinitely small arc’ MN of the curve (Fig. 5), 
he writes M(x, y) and M(x +e, y + a) for their coordinates, and sets 


f(xt+e,y+a) = f(x,y) = 9, (11) 


since M and WN are both points of the curve. He then deletes “all terms 
containing a power of a or e, or products of these (for these terms have no 
value).” Finally, ignoring the distinction between the “indefinitely small 
arc” MN and the straight line segment MN, he notes the similarity of the 
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f(x, y) = 0 


Figure 5 


triangle TQM and the “characteristic triangle” MNR, and solves (11) (with 
the higher degree terms in a and e deleted) for the slope y/t=a/e of the 
tangent line at M. Thus Barrow employs the concept of the “characteristic 
triangle” —essentially the idea of the tangent line as the limiting position of 
the secant line as a and e approach 0—and takes the limit by the expedient 
of neglecting “higher order infinitesimals.”’ 

For example, for the folium of Descartes, f(x, y)=x°+y? —3xy =0, we 
would write 


(x+e) + (yt+ay —3(x+elyta) = x? + y?— 3x, 
3xe + 3xe? + e° + 3y’a + 3ya? + a® — 3xa — 3ye — 3ae = 0, 
delete all higher degree terms in a and e to obtain 
3xe + 3y7a — 3xa — 3ye = 0, 


and finally solve for the slope 


EXERCISE 9. Apply Barrow’s method to the curve y=x” to obtain the slope 
a/e=nx"~' of its tangent line. 


In general, given the curve 


f(x,y) = > ¢, jx'y/ = 0, 
we write 
> C; (x + e)'(y + a)’ a > oxy 4. 
Expansion of (x + e)' and (y + a)/ by the binomial formula gives 


DG (xi tix’ let - ++ yi +l lat +--+) = De xy’. 
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Neglecting higher order terms in a and e, we obtain 
>» 6 (ix' ye + jxyJ—'a) = 0, 
so the slope of the tangent line is 


_ > ic, x" y/ _ _ of fox 
De xy?" of /oy 


a 
n=-= 
e 


Thus Barrow’s approach yields an analytical derivation of Sluse’s rule. 


Composition of Instantaneous Motions 


During the 1630s and 1640s an approach to tangent lines that stemmed 
from the intuitive concept of instantaneous motion was developed by 
Torricelli and especially by Gilles Persone de Roberval, who was a 
professor at the College Royal (France) from 1634 until his death in 1675. 
Their idea (not itself a new one) was to consider a curve as the path of a 
moving point, and the tangent line as the line of instantaneous motion 
of the moving point. If the motion of the point generating the curve is the 
resultant or combination of two sufficiently simple motions, then the 
instantaneous line of motion can be determined by composition of the 
constituent motions. 

The parallelogram law for the addition of constant velocity vectors was 
well-known. That is, if the points P and Q move along two intersecting 
straight lines with constant velocity vectors u and 0, respectively, and these 
two lines are taken as x- and y-axes, then the motion of the point R, whose 
x- and y-coordinates are given by P and Q, has velocity vector w=u+o 
(Fig. 6). 

Roberval took the further step of applying the parallelogram law to 
instantaneous velocity vectors. That is, if the motion of a point is com- 
pounded of two simpler motions, he assumed that its instantaneous veloc- 
ity vector is the parallelogram sum of the instantaneous velocity vectors 
corresponding to the two simpler motions. 


0. ))$ = -= == -- = ~~ FRx, y 
- f 


“~ 


/ - / 


Figure 6 
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Figure 7 


For example, consider the Archimedean spiral given in polar coordinates 
by r= at, 0=wt. The motion of the point P(at, wf) along the spiral may be 
regarded as the resultant of a radial motion (away from the origin) and an 
angular motion. To find the tangent to the spiral at P, we therefore 
construct a radial vector of length a (the radial speed) and a vector of 
length rw (the angular speed) tangential to the circle of radius r through P. 
The diagonal of the parallelogram determined by these two vectors is the 
velocity vector at P, and therefore determines the tangent line to the spiral 
at P (Fig. 7). 

An outstanding success of the instantaneous motion approach was the 
determination of the tangent to the cycloid. Consider a circle of radius a 
that is initially tangent to the x-axis at the origin, and thereafter rolls along 
the x-axis to the right with unit angular speed (one radian/sec). Then the 
cycloid is the trajectory of the point P on the circle that was initially at the 
origin, and is given in rectangular coordinates by x=a(t—sin ft), y= 
a(1 —cos ¢). See Figure 8. 

Roberval regarded the motion of the point P along the cycloid as 
compounded of (1) uniform translation to the right with speed a, and (2) 
clockwise rotation with unit angular speed, centered at time ¢ at the point 


T(at, 0) 


Figure 8 
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(at, a). The corresponding instantaneous velocity vectors are given in 
rectangular coordinates by 


u =(a,0) (translation) 
and 


w =(-—acost,asint) (rotation). 


Their parallelogram sum (given in rectangular coordinates by coordinate- 
wise addition) is the velocity vector 


v = (a(1—cos f), a sin ¢), 


which determines the tangent line to the cycloid at P. Note that this result 
is the same as that obtained by coordinate-wise differentiation of the 
position vector (a(t — sin ¢), a(1 — cos #)). From the modern viewpoint, this 
latter observation is what verifies (in this example, at least) the validity of 
the process of combining instantaneous velocity vectors by parallelogram 
addition. 


EXERCISE 10. Prove that the tangent vector to the cycloid, calculated above, is 
perpendicular to the line through the point P on the cycloid and the point T of 
contact between the rolling circle and the x-axis. 


According to the focus-directrix definition, the parabola y* = 4px is the 
locus of a point that is equidistant from the directrix x = — p and the focus 
F(p, 0). A point P moving along the parabola subject to this condition has 
equal components of velocity directed away from the directrix and away 
from the focus. It therefore appears that the tangent line to the parabola at 


Figure 9 
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Figure 10 


P bisects the angle between the line PF and the line through P perpendicu- 
lar to the directrix (Fig. 9). 

Similarly, an ellipse may be defined as the locus of a point, the sum of 
whose distances u and v from two foci F and F’, respectively, is constant, 
u+v= 2a. If a pomt P moves along the ellipse subject to this condition, 
then the rate of increase of u and the rate of decrease of v must be equal. 
Thus the point undergoes motions of equal magnitude away from F and 
towards F’. It therefore appears that the tangent line to the ellipse at P 
bisects the angle between two unit vectors at P, one directed away from F 
and the other directed towards F’ (Fig. 10). 

The following two exercises indicate that it was something of a stroke of 
good fortune that Roberval obtained the correct tangent lines to the 
parabola and ellipse by this method, for it gives in each case only the 
correct direction and not the correct magnitude of the velocity vector. 


EXERCISE 11. Let u denote the distance of a moving point P on the parabola 
y? = 4px from the directrix x = — p and from the focus (p, 0). If the point moves in 
such a way that u’ = x’=1 (unit horizontal speed), show that the tangent vector ¢ 
shown in Figure 9 is ¢=(2x/(x +p), y/(x +:p)), while the actual velocity vector of 
P is 0=(1, V p/x ). Then show that ¢ and © point in the same direction, but 


Sata. x : ~ _,/xX+P 
|j¢| =2 Ep while |o| em ; 


EXERCISE 12. Let u and wv denote the distances of the point P on the ellipse 
x*/a*+y*/b?=1 from the foci F(—c,0) and F’(c, 0), respectively (c= 
Va? — b?). Then u=V/(x+c)?>+y? and v= V(x—c)*+y2. Consider the 
point P as moving clockwise around the ellipse subject to the condition u + v =2a, 
with u’ = 1, v’ = — 1. When P is at the point (0, 5), show that the vector ¢ shown in 
Figure 10 is t=(2c/a, 0), while the actual velocity vector 5=(x’, y’) isi =(a/c, 0). 
Hint: Subtraction of the equations (x + c)?+y?=u? and (x—c)*?+y*=v? gives 


4xc =u? — v’. 
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The Relationship Between Quadratures and Tangents 


The application of time and motion concepts to the study of curves led 
both Torricelli and Barrow to at least an intuitive understanding of the 
inverse relationship between tangent and quadrature problems, that is, 
between the operations of differentiation and integration. _ 

On the one hand, medieval investigations and the subsequent work of 
Galileo suggested that the motion of a point, along a straight line with 
varying velocity, be represented by means of a graph of its velocity versus 
time. Indivisibles considerations then indicated that the total distance 
traveled by the point would equal the area under the velocity-time curve, 
because the distance traveled during an infinitesimal element of time 
would equal the product of this time element and the instantaneous 
velocity (Fig. 11). 

For example, if the point began its motion at time ¢=0 and moved with 
velocity v = t” at time ¢, the distance y traveled would equal the area under 
the curve v =1¢", so 


pnt! 
y= (12) 
On the other hand, the same motion could be represented by a graph of 
position versus time. If a point moves along the curve y=y(¢t) with 
horizontal speed 1 and vertical speed v (the velocity of the point whose 
motion is represented in Fig. 11), the velocity vector of this point will be 
the resultant of a horizontal vector of length 1 and a vertical vector of 
length v (Fig. 12). Consequently the slope of the tangent line to the 
position curve y = y(f) will be the velocity v. 
For example, if the distance traveled in time ¢ is given by Equation (12), 
then the velocity must be 


n+1- 


CHL", (13) 


because ¢” is the slope of the tangent line to the curve y =t”*!/(n+ 1). 
Thus Equations (12) and (13) imply each other. Whereas the two facts 
that 


Area = 


distance y 


t 
Velocity—time curve 


Figure 11 


The Relationship Between Quadratures and Tangents 139 


Position—time curve 


Figure 12 


(a) the area under the curve y = x” is x"*!/(n+ 1), and 
(b) the tangent line to the curve y= x”*!/(n+ 1) has slope x’, 


had originally been deduced from entirely separate considerations, the 
relationship between Figures 11 and 12 showed that each of these facts 
followed from the other. 

Specifically, the relationship between these two figures 1s that the slope 
of the tangent line to the area curve y=y(¢) (Fig. 12) 1s equal to the 
ordinate of the original curve o=v(¢) (Fig. 11). This is an embryonic 
formulation of the fundamental theorem of calculus—the rate of change of 
the area under a curve is equal to its ordinate. As we will see in Chapter 8, 
this idea was Newton’s starting point for the development of an algorith- 
mic calculus. Chapters 6 and 7 will be devoted to the historical introduc- 
tion of two additional analytical tools that played important roles in the 
computational machinery of the calculus—logarithms and infinite series. 

Neither Torricelli nor Barrow exploited for computational purposes even 
an intuitive form of the fundamental theorem of calculus. Although 
Barrow began his published Geometrical Lectures with a treatment of 
curves that was based on motion concepts, he ended with formally stated 
results having a rigidly geometric and static character. His statement in 
Lecture X of the fundamental theorem may be described as follows (see 
Struik’s source book [8], pp. 253-263 for an English translation of the 
pertinent passage). 

For convenience let the y- and z-axes be oppositely oriented as shown in 
Figure 13. Given an increasing positive function y=/f(x), denote by 
z= A(x) the area between the curve y = f(x) and the segment [0, x] along 
the x-axis. Given a point D(xp, 0) on the x-axis, let T be the point on the 
x-axis such that DT = DF / DE = A(x,)/f(Xo). Then Barrow asserts that 
the line 7F touches the curve z = A(x) only at the point F(x, A(X9)). 

Note that the slope of 7F is 


DF A(Xo) 


DT ~ AGy)/fe) 2° 
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Figure 13 


If Barrow were asserting that 7F is the tangent line to the curve z = A(x) 
in an analytical sense, with appropriately defined slope A’(x,), this result 
would therefore amount to the conclusion that A’(x,)=f(X,), the funda- 
mental theorem of calculus. However, he only asserts (and proves) that TF 
is tangent to z= A(x) in the ancient Greek sense of a straight line that 
touches the curve at only one point. 

To prove this, he considers a point J(x,, A(x,)) on the curve with x, < Xp, 
and proceeds to show that the point K, of intersection of the horizontal 
line JL with TF, lies to the right of J as shown (Fig. 13). To see this, note 
that LF/ LK = DF /DT=DE (by definition of point T), so LF= LK X 
DE. But 


LF = DF — PI = A(x») — A(x,) < DP X DE 


because f(x) is an increasing function. Therefore LK * DE < DP X DE, so 
LK <DP= LI, as desired. The case x, >Xg is similar. 

Thus we see that Barrow’s result, which can and has been interpreted as 
an early statement of the fundamental theorem of calculus, was in reality 
formulated and established by him in a spirit more akin to classical 
Euclidean geometry than the emerging calculus of computational algo- 
rithms and processes. 

It may be added that a similar result with similar proof was published 
slightly earlier in 1668 by the great young Scottish mathematician James 
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Gregory (1638-1675), who apparently duplicated (in his unpublished work) 
some of the key discoveries of Newton and Leibniz, but died prematurely 
before winning proper recognition for his work. 
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Napier’s Wonderful Logarithms 


John Napier (1550-1617) 


The late sixteenth century was an age of numerical computation, as 
developments in astronomy and navigation called for increasingly accurate 
and lengthy trigonometric computations. Georg Joachim Rheticus 
(1514-1576) began the computation of a great collection of 15-place 
trigonometric tables which were completed and published by Otho in 1596 
and by Pitiscus in 1613. The urgent need, for some device to shorten the 
labor of tedious multiplications and divisions with many decimal places, 
was met through the invention of logarithms by Napier and others around 
the turn of the seventeenth century. 

John Napier was the eighth baron (or laird) of Merchiston. He is said to 
have regarded his book A Plaine Discovery of the Whole Revelation of Saint 
John (1593) as his most important contribution. This polemical tract 
contained proofs in Euclidean fashion that the Pope was the Antichrist 
and that the world was due to end in the year 1786. With this theological 
work behind him, he began in 1594 the work that was to revolutionize the 
practical art of numerical computation. This labor occupied a twenty-year 
period spent in the isolation of Merchiston castle near Edinburgh in the 
south of Scotland. 

Napier’s logarithmic tables first appeared in 1614 in a small book 
entitled Mirifici Logarithmorum Canonis Descriptio (Description of the 
Wonderful Canon of Logarithms), which contained only an introduction 
and guide to the computational use of the tables. The method of computa- 
tion of the tables themselves, and to a lesser extent the reasoning upon 
which they were based, were summarized in the Mirifici Logarithmorum 
Canonis Constructio (Construction of the Wonderful Canon of Loga- 
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rithms), the first written of the two books, but published posthumously in 
1619. Extracts from an 1889 English translation of the Constructio by 
W. R. Macdonald may be found in the Napier tercentennary memorial 
volume ({NT], pp. 25-32) or in D. J. Struik’s mathematics source book 
({11], pp. 11-21). 

The practical advantages, of using logarithms to convert tedious multi- 
plications and divisions to comparatively simple additions and subtrac- 
tions, were immediately obvious. For example, when Kepler received 
Napier’s tables of 1614, he enthusiastically employed them in the enor- 
mous computations that led to the discovery of his third law of planetary 
motion. 

Today we think of the logarithm log,x of the number x (with base a) as 
the power to which a must be raised to obtain x. However, in order to 
properly gauge the magnitude of Napier’s accomplishment, it is important 
to realize that fractional powers and exponential notation had in Napier’s 
time not yet been developed. Neither was the decimal point system of 
numeration generally accepted. Indeed, it was Napier’s systematic use of 
decimal points that was largely responsible for the general adoption 
of decimal point notation during the seventeenth century. 

In particular, we think of the logarithm as a function, or even as the 
inverse of an exponential function. However, Napier’s computations were 
based on a clear understanding of a particular functional relationship at a 
time when the general concept of a function was still unknown. Indeed, the 
logarithm function played a prototype role in the development of this 
general concept. Also, as we shall see in this chapter, the study of 
logarithms led to the calculation of hyperbolic areas (such as the area 
under the rectangular hyperbola xy=1). In these ways the logarithm 
function, in addition to its computational importance, played a significant 
role in the historical development of the calculus. 


The Original Motivation 


The object of Napier’s “wonderful canon of logarithms” was to reduce the 
tedious operation of multiplication to the much simpler operation of 
addition by means of the correspondence between an arithmetic series and 
a geometric series. In his Arithmetica Integra of 1544, Michael Stifel (with 
whose work Napier is likely to have been familiar) set down side-by-side 
the arithmetic and geometric series 


0 ] 2 3 4 5 6 7 8 
] 2 4 8 16 32 64 128 256 


and pointed out that addition in the upper (arithmetic) series corresponds 
to multiplication in the lower (geometric) series (see pp. 85-86 of D. E. 
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Smith’s article “The law of exponents in the works of the sixteenth 
century” in [NT]). For example 3+5=8 corresponds to 8 X32 =256. 
Although the lack of exponential notation prevented Stifel from writing 
2? - 2>=2'*>, he referred to the upper numbers as “exponents” of the 


lower numbers. 

In order that a correspondence between arithmetic and geometric series 
be useful for practical computations, it was obviously necessary that the 
common ratio between successive terms in the geometric series be close to 
unity, in order that the gaps between successive terms would remain small. 
Napier began with this common ratio as 0.9999999 (= 1 — 107’ in modern 
exponential notation), and the First Table of the Constructio consists of the 
first 101 terms of the geometric series with first term 10,000,000 (= 10’), 
that is, the numbers 


1071-1077)", n=0,1,2,..., 100. 


He obtained each term from the previous one by an easy subtraction, as 
follows. 


10000000.0000000 
— 1.0000000 
9999999 .0000000 
— 0.9999999 
9999998.000000 1 
continued up to 


9999900.0004950 


He called the numbers 0, 1,..., 100 the Jogarithms (=ratio numbers) 
of the numbers thereby obtained, eg. 100 is the logarithm of 
9999900.0004950. Thus his original idea of the logarithm of a number x 
(less than 10’) was the number of times that 10’ must be multiplied by 
(1—10~’) to yield x. Hence let us write y= Nlog x (the Naperian loga- 
rithm of x) if 


x = 10(1-107’Y. 


Note first that Nlog 10’7=0, and that Nlog x increases as x decreases, in 
contrast with modern natural logarithms. Thus the frequent designation of 
natural logarithms as “Naperian logarithms” is inaccurate. 
Next note that, if 
x’ = 107(1-107’Y, 
then 
~ =(1-107-7y~”, 


, 


x 
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so the difference of the logarithms of x and x’ depends only on the ratio of 
x and x’ (hence the name logarithm = ratio number, of Greek origin). As 
Napier says in Art. 36 of the Constructio, “the logarithms of similarly 
proportioned sines are equidifferent.” 

It follows that if x,, x,,...,X, 18 a geometric progression, then the 
sequence of logarithms 


Nlog x,, Nlogx,, ..., Nlog x, 


is an arithmetic progression. This fact was the basis for Napier’s computa- 
tion of his table of logarithms. 

Obviously Napier could not simply continue the First Table in the above 
fashion, because it would require over 6,900,000 steps to reduce the first 
term 10,000,000 by a factor of two to 5,000,000. 


EXERCISE 1. Use modern logarithms to compute the exact number of steps that 
would be required to reach 5,000,000 in this manner. That is, for what n is 
(1—107’" =19 


In the Second Table of the Constructio Napier computes the first 51 
terms of the geometric series with common ratio (1— 107°), that is, the 
numbers 


10’(1-— 107°)’, r=0,1,2,..., 50. 
Again, the successive terms are computed by easy successive subtractions. 


10000000.000000 - 
— 100.000000 
9999900.000000 
— 99.999000 
~ 9999800.001000 
continued up to 
~9995001.224804 


(Napier erroneously has 9995001.222927 for the last term here.) 

With a common ratio of (1—107>), it would still require over 69,000 
steps to reach 5,000,000. Of course Napier does not intend to continue in 
this way, either. These first two tables are to be used only to interpolate 
between the entries in his Third Table, which has 21 rows and 69 columns, 
the element in the pth row and qth column being 


1 \P7! [\eeo? 
UT So Syne ier 
10 (1 ; 500 | (1 0) | 
Thus each row is a geometric progression with 69 terms and common 
factor (1 — 1/100), while each column is a geometric progression with 21 
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terms and common factor (1 —1/2000). Finally, the ratio of the first 
number in each column, to the last number in that same column, is 
(1 — 1/2000)*°, which is approximately equal to 1—1/100=99/100, and 
the last number in each column is approximately equal to the first number 
in the next column. The simplified version of this table printed below is 
based on Hobson’s exposition [8]. 


First column 2nd column 69th column 
10000000.0000 9900000.0000 ce 5048858.8900 
9995000.0000 9895050.0000 oe 5046334.4605 
9990002.5000 9890102.4750 cs $0438 11.2932 
9900473.5780 9801468.8423 se 4998609 .4034 


As Napier says, “in the Third Table [between 10000000 and approxi- 
mately 5000000] you have sixty-eight numbers interpolated, in the propor- 
tion of 100 to 99 [between successive terms], and between each two of 
these you have twenty numbers interpolated in the proportion of 10000 to 
9995.” These 21 X69=1449 numbers, fairly evenly interspersed in the 
interval [5000000, 10000000], constitute the basic reference points in the 
sophisticated and ingenious interpolation scheme that follows in the Con- 
Structio. 

The logarithms of the numbers in the Third Table above could be 
approximated by linear interpolation. Since the numbers in each row and 
those in each column form a geometric progression, their logarithms form 
an arithmetic progression. Therefore the logarithm of the element in the 
pth row and qth column 1s 


(p — 1) Nlog 9995000 + (q— 1) Nlog 9900000, (1) 


so it suffices to compute the logarithms of 9995000 and 9900000. This is 
the purpose of the First and Second Tables. 
Extrapolating linearly from the last element of the First Table, we obtain 


100 x (100/99.999505) 
= 100.000495. 


Hence if the ratio of two numbers (such as two successive terms in the 
Second Table) is 100000/99999, then the difference of their logarithms is 
100.000495. From the last term of the Second Table we therefore obtain 


Nlog 9995001.22 = 50 x 100.000495 
= 5000.02475, 


and linear extrapolation from this value gives 
Nlog 9995000 = 5001.24506. 


Nlog 9999900 


ld 
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Hence if the ratio of two numbers (such as two successive terms of a 
column in the Third Table) is 10000/9995, then the difference of their 
logarithms is approximately 5001.245. From the last term of the first 
column of the Third Table we therefore obtain 


Nlog 9900473.578 = 20(5001.245) = 100024.9. 
Now 
Nlog 9895523.34 = Nlog(9900473.58)(.9995) 


100024.9 + 5001.25 
105026.15. 


Finally linear interpolation between these last two logarithms gives 
Nlog 9900000 = 100503.36. 


We can now fill in the logarithms of the remaining terms of the Third 
Table using (1). For the last terms in the 69th column we obtain 
Nlog 5001109.96 = 19(5001.245) + 68(100503.36) 
= 6929252.14 


nM 


and 
Nlog 4998609.40 = 20(5001.245) + 68(100503.36) 
= 6934253.38. 


Linear interpolation between the last two logarithms gives 


Nlog 5000000 = 6931472.12. 
Actually 
Nlog 5000000 = 6931471.81, 


sO our computations based on linear interpolation are correct to seven 
significant figures. If the ratio of two numbers is 2, then the difference of 
their logarithms is 6931472. 


EXERCISE 2. Once the logarithms of numbers between 10,000,000 and 5,000,000 
have been computed, show how the value of Nlog 5,000,000 can be used to 
compute the values of logarithms of numbers less than 5,000,000. 


The table of logarithms of the numbers in the Third Table was called by 
Napier his “radical table”. By interpolation between values in the radical 
table he computed his principal table or “canon” of logarithms of sines of 
angles between 0° and 90° at intervals of one minute. It should be pointed 
out that his sine of an angle was the opposite side in a right triangle with 
hypotenuse 10,000,000, so his sines ranged from 0 to 10,000,000; the 
modern definition of trigonometric functions as ratios is due to Euler. 
Hence his sines of angles between 0° and 90° were numbers lying between 
0 and 10’. 

For purpose of simple illustration we have outlined above a reconstruc- 
tion of Napier’s “radical table” using /inear interpolation. However, Napier 
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recognized (at least intuitively) the non-linearity of the logarithm function, 
and therefore employed a somewhat subtler method of interpolation that 
enabled him to assign upper and lower bounds to the value of each desired 
logarithm. His intent was to guarantee accuracy to 7 significant figures, 
although his numerical error at the end of the Second Table made his 
seventh place unreliable. 

For the purpose of this non-linear interpolation, Napier required a 
continuous definition of the logarithm function, rather than a discrete 
definition based on geometric progressions. Our interest here is in his 
definition of the logarithm as a continuous function, rather than his precise 
manner of interpolation, so for further details concerning his interpolation 
scheme we refer the reader to the accounts of Coolidge [5] and Hobson [8]. 


Napier’s Curious Definition 


Napier’s actual logarithmic definition was based on considerations of the 
continuous motion of points along straight lines, no doubt because intui- 
tive conceptions of physical motion provided (at that time) the only usable 
basis for quantitative considerations of continuous variables. For a conjec- 
tured reconstruction of Napier’s thought, from the original consideration 
of arithmetic and geometric progressions to the eventual definition in 
terms of continuous motion, see Lord Moulton’s article “The Invention of 
Logarithms, Its Genesis and Growth” in [NT]. 

This definition involves two points moving along two different lines. The 
first point P starts at the initial point P, of a segment P,O of length 10’, 
with initial speed 10’, and moves toward O, with its speed decreasing in 
such a way that it always equals the remaining distance PO. The second 
point L starts at the initial point L, of a half-line, and moves to the right 
with constant speed 10’ (Fig. 1). Napier then defines the segment y = L)L 
to be the logarithm of the segment x = PO. As he says in Art. 26 of the 
Constructio, “The logarithm of a given sine is that number which has 
increased arithmetically with the same velocity throughout as that with 
which radius began to decrease geometrically, and in the same time as 
radius has decreased to the given sine” ({11], p. 16). 

It is informative to explore this somewhat obscure definition in terms of 
what we now call natural logarithms—log x being the power to which e 


Figure 1 
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must be raised to give x. In calculus notation, the motion of the point P is 
described by the differential equation 


dx 
at 
with the initial condition x(0) = 10’, whose solution is 


log x = —t+ log 10’ 
or 


10’ 
t= log—_. 


The motion of the point L is therefore given by 


7 
y = 10% = 10’ log—. 


If we write y = Nog x for Napier’s logarithm of x, we therefore see that the 
relation between Nog x and the natural logarithm log x is given by 


_, 107 
Nog x = 10 log—_. . (2) 


It is clear from (2) that Napier’s logarithms do not share the (now) usual 
properties of logarithms. For example, Nog 10’ =0, while Nog x increases 
as x decreases in such a way that Nog x00 as x->0. Nevertheless Nog x 
has alternative properties that facilitate computation in a manner similar to 
the use of “ordinary” logarithms. 


EXERCISE 3. Use (2) and the laws of logarithms (log xy = log x + log y, log x7= 
a log x) to show that 


(i) Nog xy = Nog x + Nog y — 10’ log 10’ 
(ii) Nog x7 =a Nog x +(1—a)10’ log 10” 
(iii) Nog (x/y) = Nog x — Nog y + 10’ log 10’ 


Thus the computational use of a table of “Nogarithms” would involve continual 
addition or subtraction of multiples of 10’ log 10 = 23,025,851. 


On the basis of the above definition, Napier proceeded with his com- 
putations in essentially the following way. In successive time intervals of 
length 10~’, starting at time t=O, the point L with constant speed 10’ 
moves a distance of 1 during each time interval, determining the points 
L,, L,, L3,..., with LoL, =n (Fig. 2). During the first of these very short 
time intervals the point P moves from P, to P, with a speed that is 
decreasing but still almost 10’, so P)>P,=1 and x,=P,O=10’—-1= 
10’(1— 10~’). During the second time interval the speed of P is approxi- 
mately 10’(1— 107’), so 


x, = OP, = 10’(1—10~7) — (1— 1077) = 107(1— 1077)”. 
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Figure 2 


Continuing in this way, we find that x, = OP, is approximately 
107(1 — 10~’)". Thus Napier’s logarithm of 10’(1 — 10~’)” is approximately 
n, 


Nog 107(1— 1077)" =n. 


That is, if x= 107(1—107’)", then Nog x =n. If we now write Nog x=n 
(the tilde signifying approximation), then Nog x is the version of the 
logarithm that we denoted by Nlog x in the previous section, and this is 
the basis for the complicated interpolation scheme described there. 

By means of ingenious approximations Napier guaranteed that his 
interpolations were accurate (except for the mistake mentioned previously) 
to 7 significant figures. Using (2) and the infinite series 

x* x 
og(1 — x) x->73 
that will play an important role later, we see that, if x= 10’(1—107’)’, 
then 


10’ 
10’7(1— 1077)” 
— 10’n log(1 — 107’) 


n ieee ae 
2 3 


Nog x = 10’ log 


~7 
(1+ +: ) Nog » 


= 1.00000005 Nog x. 


EXERCISE 4. If 
x = 107(1—10-7)" and y = 107(1—10-7)" 
so Nog x = m and Nog y = 2, show that 


Nog xy = Nog x + Nogy — Q 
where 
(1— 10-7)? = 1077. 


Arithmetic and Geometric Progressions 


Then note that 
Q = logy —10-710~7 = Nog 1 
— 107 logg—10-7)"10’ (why?) 
— 107 log; /-10’ 
10’ log, 107 = Nog 1. 


Rt 


Thus Nog x obeys essentially the same additive law as Nog x. 


Exercise 5. If x = 10’(1 — 107 ’)", show that 

Nog x = logy —10-7(107 7x) 
107 loge - 10-7)""(10~ 7x) 
10’ log, ;.107 "x 


ty 
.. 


Rt 


= 10’ log 
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As we have seen, the key idea employed by Napier in his logarithmic 
calculations was that of pairing the terms of an arithmetic progression with 
those of a geometric progression as in the following table. If x and y are 
two numbers whose product is desired, and they are terms of the geometric 
progression, x = ar™ and y = ar”, then their product divided by a, xy/a= 
ar™*" appears in the right-hand column opposite the term (m+ n)b in the 
left-hand column. If the number a is a power of 10 so that multiplication 
by a to obtain xy = ar™- ar" = a(ar™*") involves simply a shift of the 
decimal point, the table therefore reduces the problem of multiplying x 


and y to the addition of the integers m and n. 


Arithmetic Geometric 


Progression Progression 
b ar 
2b ar? 
3b ar? 
mb ar™ 
nb ar” 
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Napier’s table involved such a pairing with b=1, a= 10’, r=1—107’. 
This idea seems to have been “in the air” around the turn of the seven- 
teenth century, for in work done simultaneously (but published later than 
Napier’s), the Swiss instrument-maker Jost Btrgi constructed a similar 
table with b=10, a=10°, r=1+10~*. Napier’s value r=1—10~’ and 
Biirgi’s value r = 1+ 10~* were both chosen very close to unity, so that the 
successive entries in the right-hand column (where x and y must be found 
if they are to be multiplied) would be very close together. 


Napier’s Table Burgi’s Table 
1 107(1 — 107”) 10-1 108(1 + 107%) 
2 107(1 — 1077) 10-2 108(1 + 107%)? 
3 10’(1 — 107’) 10 - 3 1081 + 10-4)? 
; 107(1 — 10-7)" 10-7 10°(1 + 10-4)" 


Biirgi continued his table to 23,027 entries, because (1 + 1074)???’ = 10. 
Since n is the logarithm to the base (1 + 10~*) of (1+ 10~“*)”, Birgi’s table 
was, except for the placement of decimal points, a table of antilogarithms 
to the base (1+ 107%). 

By an appropriate shift of the decimal points in either Napier’s or 
Biirgi’s table, we can approximate natural (base e) logarithms. For exam- 
ple, write 


Bogx =X 1074 ifx = (1+107%)’, 


and consider the following variant of Burgi’s table. 


Bog x x 
1x10~4 (1+ 107%)! 
2x 10-4 (1 + 10~*)? 
nx 1074 (1+ 10-4)" 


To see what “Bogs” really are, write m =n < 10~*. Then 


x = (1+107*)” =[(1+ 10-4)" ]” 
SO 


=nx 10-4 
=m 
= logy qi +10-4!'°°]*- 


Bog x 
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But (1 + 10-4)! =2.718 xe to 4 significant figures, so Bog x is essentially 
the natural logarithm of x. This answers the question, properly posed by 
students, as to precisely what is “natural” about natural logarithms. 


EXERCISE 6. Motivated by Napier’s table, write nog x =n X10~7 if x =(1— 1077)’. 
Then show that nog x is essentially the logarithm of x with base 1/e. Hint: 
(1— 1077)! = 1/e. 


EXERCISE 7. Show that Bogs satisfy the laws of logarithms, 


Bog xy = Bog x + Bog y, 
Bog x? = a Bog x. 


The Introduction of Common Logarithms 


In 1615 the English mathematics professor Henry Briggs visited Napier in 
Scotland, and their discussions led to Briggs’ construction of a table of 
“improved” logarithms, ones for which the logarithm of one is zero and the 
logarithm of ten is one. These are now called “common” or base 10 
logarithms. 

Briggs immediately began the computation of these improved loga- 
rithms, which had more useful computational properties, and in 1624 
published the Arithmetica Logarithmica, a table of 14-place common loga- 
rithms of the first 20,000 integers and of those from 90,000 to 100,000. The 
gap between 20,000 and 90,000 was filled by the Dutchman Adrian Vlacq, 
who published in 1628 the table of 10-place common logarithms from 1 to 
100,000 that was to constitute the basis for nearly all logarithm tables for 
the next three centuries. 

If Briggs’ common logarithm Log x of x is defined in terms of Napier’s 
logarithm by the transformation 


Nog | — Nog x (3) 
Nog 1 — Nog 10’ 


then it is obvious that Log 1=0 and Log 10=1. 


Log x = 


EXERCISE 8. Use the fact that Nog xy = Nogx + Nog y — Nog | and (3) to show 
that the common logarithm Log x satisfies the law of logarithms, 


Log xy = Log x + Log y. 


It follows from the law of logarithms that common logarithms enjoy the 
useful property that numbers differing only by location of the decimal 
point have logarithms differing by an integer, 


Log 10"x = n+ Log x. 
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For example, 
Log 2 = 0.30103, 
Log 20 = 1.30103, 
Log 200 = 2.30103. 


Instead of using the above transformation (3) to convert Napier’s table 
to a table of common logarithms, Briggs recomputed the whole table using 
a different method. He began by calculating successive square roots of 10. 
Starting with Log 10=1, he obtained the logarithm of each root by halving 
the logarithm of the previous root, as follows. 


x Log x 
10 1.0000 
10!/2 = 3.16228 0.5000 
10!/4 = 1.77828 0.2500 
10'/8 = 1.33352 0.1250 
10!'/16 = 1.15478 0.0625 


After 54 such square root extractions (each carried out to 30 decimal 
places), he obtained a number a =(10)'/* very slightly greater than one, 
for which 


Log a= 354 , 
By repeated application of the law of logarithms, he then built up a table 
of logarithms of closely spaced numbers, the first table of “common 
logarithms.” 


Logarithms and Hyperbolic Areas 


The tables of Napier and Briggs and their followers revolutionized the art 
of numerical computation. However, the importance of logarithms in the 
historical development of the calculus stems from a discovery published in 
1647 by the Belgian Jesuit Gregory St. Vincent, that implies a surprising 
connection between the natural logarithm function and the rectangular 
hyperbola xy = 1. 

If [a, b] is a closed interval on the positive axis, denote by A, , the area 
of the region that lies over this interval and under the hyperbola xy = 1 
(Fig. 3). Then what Gregory discovered may be stated as follows. If ¢>0, 
then 


Aig tbh — Ay b° (4) 
To see why this is true, let 


=X XR SX tt SR xe 
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Figure 3 


be equally-spaced points subdividing the interval [a, 5] into a large number 
n of sub-intervals, and above these sub-intervals construct inscribed and 
circumscribed rectangles as indicated in Fig. 4. Then the inscribed and 
circumscribed rectangles over the ith subinterval of [a,b] have base 
(b—a)/n and heights 1/x,; and 1/x;_,, respectively. Therefore 


“ b-a “ b-a 
<A,,< 5 
2 ee 2 NX; ©) 
Now the points 
la=txpg +++ Sy, <1, S++ <x, = tb 


similarly subdivide the interval [ta, tb] into n equal subintervals. The 
inscribed and circumscribed rectangles over the ith subinterval [tx,_,, tx;] 
of [ta, tb] have base (tb —ta)/n and heights 1/tx, and 1/tx,_,, respec- 
tively. Hence their areas are equal to those of the inscribed and circum- 
scribed rectangles over [x;_ ,, x,]. Therefore 


". b-a " b-a 
> < Aig w < > (6) 


i=p 1; j=y MX;-] 


Comparison of (5) and (6) makes evident the truth of (4). 
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Figure 5 


EXERCISE 9. Complete the above argument to rigorously prove by Archimedes’ 


method of compression that A, , = Aja, »- 


In reading through Gregory’s Opus Geometricum, his friend A. A. de 
Sarasa noticed that Equation (4) implies that a certain area function 
associated with the hyperbola xy=1 has the additive property that is 


characteristic of logarithms. Let 


i= A, . if x > 1, 
0 ) HA, if 0<x <1. 


x, 


Then L(x) satisfies the “law of logarithms,” 
L(xy) = L(x) + L(y). 
For example, if x and y are both greater than 1, then 
L(xy) =A 1, xy 
=A,,+A,., (see Fig. 5) 
= Ay x ATs (by Eq. (4)) 
L(xy) = L(x) + L(y). 


EXERCISE 10. Establish Equation (7) in the cases 


O<x<y<«<l1 and O<x<l<y. 


(7) 


EXERCISE 11. If 1<a,;<a,<-::+ <a,<-+-- is a geometric progression, apply 


Equation (4) to show that 


0 < L(a,) < L(a) <-:: <L(a,) <-:: 


is an arithmetic progression. 


Thus the hyperbolic area function L(x) “looks like a logarithm,” in that 
it provides a pairing between geometric and arithmetic progressions, so it 1s 
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natural to inquire as to its relation to the natural logarithm function log x. 
It is, in fact, true that L(x) =log x, although this relationship was not fully 


clarified until the time of Euler in the eighteenth century. 
However, using a bit of calculus, we can “unmask” the function L(x), by 


computing its derivative as follows. 
L(x + h)— L(x) 
h 


L(x) = lim 

lim H(i+2) (using (7)) 

~ hao A e) ome 

=< lim $1(1+=) 
x noo h 
= — lim 7 = L(1+k) («==} 
Xk x 
ae lim z+ h)= 0) (because L(1) =0) 
X k—0 

_ LG) 

ease 


It remains only to compute the single value 


1) = LU+h) _ |. Arisa 
a a eo 
of the derivative of L. Consulting Fig. 6, we see that 


h 
l+h 


< Al ish < h, 


SO 


l+h 


Figure 6 
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Taking the limit as h-0, it is clear that L’(1)=1, so 
a 
= 


L(x) = 


Since L(x) and log x thus have the same derivative 1/x, as well as the 
same value L(1)=log 1=0 at x =1, it follows by elementary calculus that 
L(x)= log x. 


EXERCISE 12. Mimic the above computation of L’(x) to show that the derivative of 
log x is 


D logx = ee. 
xX 


where 
e = lim (1+k)'. 
k-0 


Newton’s Logarithmic Computations 


Although the precise relationship between logarithms and hyperbolic areas 
was not understood in the early seventeenth century (nor, for that matter, 
were natural logarithms recognized as logarithms to the base e), the general 
logarithmic character of the hyperbolic area function (as noticed by de 
Sarasa) served to stimulate the study of hyperbolic areas, and these 
investigations played a significant role in the introduction of infinite series 
and algorithmic calculus techniques, beginning in the 1650s and 1660s. 

Apparently the first systematic computations of logarithms as hyperbolic 
areas were carried out by Newton in the mid 1660s. In a manuscript 
probably written in 1667 (see pp. 184-189 of Vol. II of Newton’s Mathe- 
matical Papers cited in the references to Chapter 8), he starts with the 
hyperbola 

Paes Ae) 

and calculates the area A(1 + x) lying under the hyperbola and over the 
interval [0, x] (or the negative of this area if —1<x< 0); see Figure 7. 
Writing 


ye 7 leer oe ee, 


this infinite series resulting from mechanical long division of 1+ x into 1 
(as will be discussed in Chapter 7), he integrates term by term to obtain 


x? x3 
a7 TZ fe ee Sf (8) 


»|*, 


A(l+x)=x- 
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Figure 7 


Of course A(1 + x) =log(1+ x), the natural logarithm of 1+ x. Although 
Newton does not refer to A(1+x) as a logarithm, he recognizes its 
logarithmic character (perhaps from a direct or indirect acquaintance with 
the results of Gregory St. Vincent). For, referring to the points labeled in 
Figure 8, he says 


Now since the lines ad, ae, etc.: beare such respect to y“ [areas] bcdf, bche, 
etc: as numbers do to their logarithms; (viz: as y° lines ad, ae, etc.: 
increase in Geometrical Progression, so y~ superfices bcfd, bche, etc.: 
increase in Arithmetical Progression): Therefore if any two or more of 
those lines multiplying or dividing one another doe produce some other 
line ak, their correspondent [areas], added or subtracted one to or from 
another shall produce y* [area] bcgk correspondent to y' line ak. 


Thus Newton thinks of the hyperbolic area over [0, x] as “correspon- 
dent” to the line of length 1+ x (hence our notation A(1+.x)), and he 


Figure 8 
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asserts that 
A(1+x)(1+y)) = AI +x)+ A(1+y), 

1+x 

l+y 


= A(1+x) — A(1+y), 


the laws of logarithms. On the basis of these formulas he proceeds to 
calculate a small table of logarithms of integers. 

First, taking x=+0.1, +0.2 in (8), he calculates A(0.8), A(0.9), 
A(1.1), A(1.2) (to 57 decimal places!). Next he notes that 


5 = 12X12 
0.8 x 0.9 
1.2x2 

= 0.8 
2x2 

>= Og 

11 = 101.1 

10=2~x5 

100 = 10x 10 


so that he can obtain A(2), A(3), A(5), A(11), A(10), A(100) merely by 
addition and subtraction, e.g. 


A(2) = 2A(1.2) — A(0.8) — A(0.9). 


Next he substitutes x = + 0.02, +0.001 into (8) to calculate A(0.98), 
A(1.02), A(0.999), A(1.001). This permits him to calculate the logarithms 


of 7, 13, 17, because 
[100 x 0.98 
7 = a ae . 


(so A(T) = 4[A(100) + A(0.98) — .A(2))) 


13 — 1000 1.001 
mI” 
— =, 


In order to check the accuracy of his computations, Newton calculates 
A(0.9984) in two different ways: First, by substituting x = — 0.0016 into 
(8), and then by noting the factorization 

2° x3 x 13 


0.9984 = ——___, 
10° 


SO 
A(0.9984) = 8A(2) + A(3) + A(13) — 5A(10). 


He finds (with evident pleasure) that the two results agree to more than 50 
decimal places. 
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EXERCISE 13. In each of the following, start by finding the multiple of the given 
number that is closest to 1000. 


(a) Express the logarithm of 37 in terms of those of 3, 10, and 0.999. 

(b) Express the logarithm of 19 in terms of those of 2, 13, and 0.988. How would 
you compute the logarithm of 0.988? 

(c) Express the logarithm of 31 in terms of those of 2, 10, and 0.992. How would 
you compute the logarithm of 0.992? 


Mercator’s Series for the Logarithm 


The Logarithmotechnia of Nicolas Mercator (1620-1687) was published in 
1668. The first two parts of this book were devoted entirely to the 
calculation of a table of common logarithms. Mercator’s intuitive ap- 
proach was to insert 10 million geometrical means (he called them 
ratiunculae) between 1 and 10; the logarithm of the number x €(1, 10) is 
then 10~’ times the number of ratiunculae between 1 and x. 


EXERCISE 14. Consider the geometric sequence 


1= 79 r),7r7,...,r7 = 10. 


If x =r*, show that Log,;9x =k/n. 


To give an idea of Mercator’s approach, he starts by calculating 
Log,,1.005 as follows. First he successively squares g = 1.005 and finds that 
g*® << 10 <g>! and then he narrows this to 


9.965774 = g*! < 10 < g* = 10.015603. 


Interpolation then gives g*°!-8§ = 10, so the number of ratiunculae be- 


tween 1 and 1.005 is 10’/ 461.6868 = 21,659.7, and the common logarithm 
of 1.005 is 0.00216597 (his value; actually Log,,1.005 = 0.00216606). With 
this computation as a base, he proceeds to give directions for the practical 
computation of a complete table of common logarithms. See the article by 
Hofmann [9] for further details. 


EXERCISE 15. By successively squaring on a pocket calculator, obtain the following 
powers of g = 1.005. 


g = 1.00500000 gi® = 1.08307115 
g” = 1.01002500 g?? = 1.17304313 
g* = 1.02015050 g™ = 1,37603017 
g® = 1.04070705 gi28 = 1.89345904 


Next calculate 


g138 = g!2882 — 199029078, 
g!? = 1.005g!38 = 2.00024224, 
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and then interpolate between these two values to obtain 


g138.97565824 2. 


Finally use the (corrected) value Log,,.1.005 = 0.00216606 calculated by Mercator 
to obtain 


Logio2 = 138.97565824 Log,o1.005 = 0.301030. 


It is the very different third part of the Logarithmotechnia that is now of 
principal interest. Here Mercator finds his famous series (apparently used 
previously by Newton, as we have seen) 


an sane 
log + x)= x-S+yogte (8) 
for the area under the hyperbola y = 1/(1 + x) over the interval from 0 to 
x: 
He starts by computing by long division the geometric series 


ieee eer ere ae ee ©) 
It is sometimes incorrectly stated that Mercator obtained (8) from (9) by 
simple termwise integration, but he actually computed the area of the 
hyperbolic segment by a technique based on Cavalieri’s indivisibles. 
Mercator only briefly alludes to the details, but a clearer exposition was 
presented by Wallis in his review of the Logarithmotechnia that was 
published in the Philosophical Transactions of 1668. In modern terms, the 
computation that Wallis outlines 1s roughly as follows (compare the 
discussion in Coolidge [4)]). 
Let us subdivide the interval [0, x] into m equal subintervals each of 
length h = x/n, and construct the circumscribed rectangles based on these 


Figure 9 
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subintervals (Fig. 9), with heights 


eee ee de 
- +h?) 14+2h’ "  14+(n—-1)h- 


Expanding each of these heights in a geometric series, we find that the 
desired area 


n—1 
h 
Az=h+ >} —— 
j=l 1+ jh 


ae A Die yin ‘ A ae yv2ny'] 


+-:--; +h > '(n— ym"). 


Collecting terms containing equal powers of h, we obtain 
A =nh—h[h+2h+.--- +(n-1)h] 
+h[ h? + (2h)? + -- > +(n—1)h?] 


- (—1)"h[ h* +(2h)*+ --- +(n—-1)*hA*] + --- 
= x—h*[1+2+--- +(n-1)] 


+h7[1?+2?+--- +(n-1)'] 
+ (—1)*AR 1k 42K 4 0. +(n—-1*] +--- 
n—-l x3 = 
= «=| > += > ab 
n i=] n i=] 


substituting h = x /n. 
Now in his Arithmetica infinitorum of 1656, Wallis had shown (by 
analogy with explicit computations for k < 10) that 


ig l 


lim rr ea (n terms in numerator). 
ee 


Taking the termwise limit as n—> oo of the last series above, we therefore 


obtain Mercator’s series 


x? x3 x4 
clare aie Take vale Ve ie . (8) 


Wallis mentions that x < 1 is necessary for convergence. 
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As a consequence of the work of Gregory St. Vincent and de Sarasa, it 
seems to have been generally known in the 1660s that the area of a 
segment under the hyperbola y = 1/x is proportional to the logarithm of 
the ratio of the ordinates at the ends of the segment. In a note by Mercator 
himself in the Philosophical Transactions of 1668, the logarithms de- 
termined by hyperbolic segments are referred to as natural logarithms, and 
he supplies the factor 0.43429 (=1/log,10) for transforming from natural 
to common logarithms (see Hofmann [9]). 


EXERCISE 16. Show that the series (8) above is then the natural logarithm of (1 + x). 


EXERCISE 17. Let L, and L, be two “logarithm functions” having the property that 
L(x) = aL,(y) if x =y%. Then show that the functions L, and L, are proportional, 
1.e. 

L(x) = L(y) 

L,(x) L(y) 


In regard to the inverse relation between the exponential and logarith- 
mic concepts, Cajori ({2], p. 37) traces it back to Wallis’ Algebra of 1685. 
Wallis considers the progressions 


2 43 
| (RS a cae! eee 


OF Wie 25D ane tay 


and remarks that “These exponents they call logarithms, which are artif1- 
cial numbers, so answering to the natural numbers, as that the addition 
and subduction (i.e. subtraction) of these answers to the multiplication and 
division of the natural numbers.” 
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The Arithmetic of the Infinite 


Introduction 


Two main streams of discovery fueled the seventeenth century mathemati- 
cal revolution and culminated in the synthesis of a powerful new infinitesi- 
mal analysis. One was the rich amalgam of specialized area and tangent 
methods from which the basic general algorithms of the calculus were 
distilled by Newton and Leibniz. The other centered on the development 
and application of infinite series techniques. 

These two tools, the calculus and the analysis of infinite series, rein- 
forced each other in their simultaneous development, because each served 
to broaden the range of application of the other. For example, in order to 
apply the early calculus methods to transcendental or “mechanical” func- 
tions, it was often necessary to express these functions as infinite series 
that could be differentiated or integrated termwise. Thus, if the function 
f(x) could be “expanded” as an infinite (power) series, 


ioe) 
f(x) = ag tax tax?+--- = Dd a,x", 
n=0 


then presumably its derivative (or integral) could be calculated by differen- 
tiating (or integrating) each term of the series individually, just as though 
f(x) were a (finite) polynomial in x. If the validity of this process is not 
critically questioned (and, in the seventeenth century, it was not), then the 
result is immediate, 


io.@) 
f(x) = a, t2agxt--- = D> na,x""'. 
A= 


In short, the elementary techniques of calculus, as they applied to simple 


166 
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polynomials, could in this way be applied to any function for which an 
infinite power series expansion was available. At the same time, the 
termwise differentiation or integration of a known infinite series yielded a 
new one. 

As an example, we saw in Chapter 6 that the quadrature of the 
hyperbola y=1/(1+ x), a problem that is not amenable to elementary 
exhaustion or indivisibles methods, was achieved by termwise integration 
of the geometric series 


] 


=l—x+x*7—-x3+---, 
1+x 


thereby yielding Mercator’s series for the logarithm function, 


x ee 


| 2 
log(l1+x) = x a ae 

The infusion of infinite series into the “analytic art” of the seventeenth 
century raised immediate questions as to their behavior with respect to the 
ordinary algebraic processes of addition, subtraction, multiplication, divi- 
sion, and the extraction of roots. Could one properly manipulate infinite 
series in essentially the same ways that computations with ordinary alge- 
braic expressions (i.e., polynomials) are carried out? We will see in this 
chapter that these questions were answered in the affirmative with the 
conclusion that (subject to convergence questions upon which the seven- 
teenth century did not dwell) the algebra of infinite series obeys the same 
laws as the algebra of finite algebraic quantities. 

The central event in this process of “legalizing” the use and enjoyment 
of infinite series was Newton’s discovery of his famous binomial series. In 
modern notation, the binomial series takes the form 


co 
a a a n 
(1+x) =1+(f)x+(5)x2+---=14 = (a )>” (1) 
where a is an arbitrary real number and the “binomial coefficients” are 


defined by 
(*) = Me=D- cee ‘(a=nt+l1) (2) 


n n! 


The necessary condition |x| <1 for the convergence of the binomial series 
was not stated by Newton. 
In case the exponent a is a positive integer, (2) may be rewritten as 


Oe es @) 


ni(a—n)! 


for n <a, but is 0 for n >a, so (1) reduces to a (finite) polynomial, such as 
the familiar cubic 


(+x) = 14+3x 43x24 x3, 
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However, if a is not a positive integer, then (1) gives an infinite series 
expansion of (1 + x)*, such as 


] = 
Tax = (+>) 1 = l—x+x7—x°4- os 
Or 
VIi+x =(ltx) 7? =1+ix—-ixrtivrte--. 


In order to appreciate the magnitude of Newton’s accomplishment in 
formulating the binomial series in 1665, we must view it in the perspective 
of two pertinent historical facts. The first is that the use of non-integral 
exponents was then unknown. The modern exponential notation for posi- 
tive integral powers (i.e., A> instead of Viéte’s “A cubus”) had been 
introduced by Descartes in his La Geometrie of 1637, and was in general (if 
not universal) use by the 1660s. In his Arithmetica infinitorum of 1655 
Wallis had mentioned negative and fractional “indices”—he spoke of the 


series V1, V8, V27, ... as having the “index 3/2,” and the series 
V1 , 1/V2 F 1/V3 , ... as having the “index —1/2.” However, 


according to Cajori [3], the explicit use of fractional and negative expo- 
nents first appeared (publicly) in Newton’s statement of his binomial 
series. 

The second thing to remember is that the binomial formula for the case 
of positive integral powers was not then known in a form that suggested its 
generalization to negative or fractional powers. The binomial coefficients 
were not known in terms of a simple formula such as (3), but rather in 
terms of the entries in “Pascal’s triangle” (which apparently dates back to 
medieval times), in which each entry is the sum of the two entries 
immediately above and to either side. The entries in the nth row (starting 
at 0 and counting down from the top) are then the coefficients of the 
powers of x in the binomial expansion of (1 + x)”. 


] 
] ] 
] 2 ] 
] 3 3 ] 
] 4 6 4 ] 
] 5 10 10 5 1 


If the elements of Pascal’s triangle are arranged as a matrix (Table 1) 
with ones in the Oth row and column, then the “law of formation” of this 
matrix gives the element 5, , in the pth row and gth column as 


b a by, q-1 + by 1, q? (4) 


Pq 


the sum of the two elements immediately above and to the left of b, ,. The 
binomial formula for positive integral exponents can then be written as 


(1+x)"= > b,,_,x?. (5) 
p=0 
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Table 1 
q 
Pp 0 l 2 3 4 
0 l | | 1 | 
| | 2 3 4 5 
2 | 3 6 10 15 
3 ] 4 10 20 35 
4 1 5 15 35 70 


EXERCISE 1. Apply (4) to add the row and column for p= q=5 to Table 1. Then 
use (5) to write down the binomial formula for (1+ x). 


In essence, Newton attacked the problem, of generalizing (5) to the case 
where 7 is not a positive integer, as a problem of interpolating between the 
rows and column’s of Pascal’s triangle in the form of Table 1. That is, he 
sought some natural way, consistent with the law of formation (4), of 
inserting new rows and columns corresponding to non-integral values of p 
and q. 

Newton modeled his interpolation procedure on the complex process by 
which Wallis a decade earlier had deduced his famous infinite product for 
7, 


gg ge Oe (6) 
Nowadays Wallis’ product is established as an easy consequence of the 
integrals 


= a / 2n Ben Fans 1 P 3 Betas oe. on 2n—1 
I,, = i sin” x dx = ae an (7) 
and 
= nt 2n+1 — 2 ° 4 ° 6 e er e e _2n 
Jans = i a aie intr? 8) 


and it is an important step in proving Stirling’s asymptotic formula for the 
factorial, 

n!~V2an (=) (9) 
(meaning that the limit as noo of the ratio of the two sides is one). 

The interpolation procedure by which Wallis discovered (6) did not 
suffice to prove it, nor did Newton’s interpolation procedure suffice to 
prove the binomial series—neither Newton nor anyone else proved it 
rigorously before the early nineteenth century. However, the mere dis- 
covery of the binomial series played an important role in establishing the 
use of infinite series as a working tool, and provided a cornucopia of new 
infinite series for use and application. In this chapter we describe the 
original and almost mystical investigations of Wallis and Newton, not 


170 The Arithmetic of the Infinite 


merely as one of the more exotic byways in the history of mathematics, but 
also as a paradigm example of the frequently unexpected nature and 
sequence of mathematical invention, and of the crucial distinction between 
rigorous proof and the process of discovery that must precede it. 


EXERCISE 2. Apply the reduction formula 


n—-1 


fsinrx dee Sie ere S jf sin"-2x dx 
n 
(resulting from integration by parts) to obtain integrals (7) and (8). 


EXERCISE 3. (a) Deduce from (7) and (8) that 


mi2,2, 4 ~4 « e-8 8 e _2n ; 2n & Lan 
2 l 3 3 5 2n—1 2n+ 1 | care 
(b) Show that 
Lyn Ton-1 1 
] < < = + —. 
Tone Dan+1 2n 


(c) Derive Wallis’ product from (a) and (b). 


EXERCISE 4. Deduce from Wallis’ product that 
1\2927 
(n!) 2 = Va 
nO (2n)!Vn 
Hint: Multiply and divide the right-hand-side of 


~2.2,4,4,. .., | _2n_ | _2n 
~ 7) 3 3 5 2n—1 2n+1 


to obtain 


___ (nyt 
"— [(Q2n)!P(Qn +1) 


EXERCISE 5. Write n!=a,Vn (n/e)", thereby defining a, for each n. Assuming that 
lim,,_,.02, = @30, deduce from the previous exercise that a= V2 . This gives a 
weak form of Stirling’s formula. 


Wallis’ Interpolation Scheme and Infinite Product 


The last part of Wallis’ Arithmetica Infinitorum (The Arithmetic of In- 
finites) of 1655 is an attempt to compute, using his arithmetical indivisi- 
bles, the area of a quadrant of the unit circle, 


Gof Vvi-#2 dx. (10) 


Wallis’ Interpolation Scheme and Infinite Product 171 


Of course neither of the symbols 7 and f{ were then in use. Wallis writes [] 
for the reciprocal of the desired area, and in Proposition 121 sets up the 
limit sum that we would write as 


(a “Riemann sum” for the integral (10) corresponding to a subdivision of 
[O, 1] into m equal subintervals). 

Unable to directly compute this limit sum, he embarks on one of the 
more audacious investigations by analogy and intuition that has ever 
yielded a correct result (for anyone other than Euler), and winds up in the 
end with his infinite product (6) for 7/2. As we have seen in Chapter 4, he 
knew from earlier work in the Arithmetica infinitorum that 


1 1 qd 
xP/4 dy = ———__- = —+— 1] 
if (p/q)+1 pt@q oo 


if p and q are positive integers. This formula suffices for the evaluation of 
any integral of the form 


1 
f (1— x!/?)% dx (12) 
0 
if p and q are positive integers. For example, 
1 1 
f (1— x!) dx = f (1—2x'/3 + x*)dx 
0 0 


a ee 
i+] 241 10 


Wallis’ goal was to discover the “general law” or formula for the above 
integral in terms of p and gq, and then substitute p = q =; in this formula to 
obtain 


= =e fa-xy? dx. 
0 


For the purpose of recognizing the pattern, he found it more convenient 
to work with the reciprocal of the integral in (12), 


1 
3 i (gaa i a 
f(p, q) [Gaz a 
0 


He began by computing the values of f(p, g) for p, g < 10, and obtained 
the results shown in Table 2, where a,, = f(p, q) is tabulated in the pth row 
and gth column. 
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Table 2. a,,=f(D, 9) 


q 
7 0 1 2 3 4 10 
0 1 1 1 1 1 1 
1 1 2 3 4 5 1 
2 1 3 6 10 15 66 
3 1 4 10 20 35 286 
4 1 5 15 35 70 1001 
10 1 11 66 286 1001 184756 


On the basis of these computed values for p, g < 10, Wallis took it as 
obvious that Table 2 is (for all p and gq) simply a table of binomial 
coefficients. That is, each entry in the table is the sum of the one above it 
and the one to its left (compare Table 2 and Table 1). 


EXERCISE 6. (For those familiar with the gamma and beta functions). Substitute 
x= y? to obtain 


1 1 1 
—_ = l—x!/?)4 dx = 1l-— %,p-l 
a f( ) pf ( yy?! dy 


C(ip+D(q+)) 


= pB(p,q+1) = [(pt+qtl) 


piq! 1 


>= 


(p+q)! — ee). 


In addition to explaining the evident diagonal symmetry of Table 2, 
Exercise 6 provides formulas that can be used to interpolate between the 
elements of a given row or column of the table. Wallis actually writes 
down these formulas on the basis of regarding the rows as sequences of 
“figurate” numbers. 

For example, the second row (p =2) consists of the “triangular” num- 
bers 


1, 3, 6, 10, 15, ..., 
for which 


ay, = 3(4+1)(¢ +2). 
Similarly, the third row (p =3) consists of the “pyramidal” numbers 


1, 4, 10, 20, 35,..., 
for which 


ax, = 3(q+ 1)(g+2)(q +3). 
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In general, 


a, = 5 (4+ Iq +2): .- (q+p). (13) 


EXERCISE 7. Conclude from (13) that 


(14) 


Now Wallis wants to expand Table 2 by interpolation, to insert rows and 
columns corresponding to half-integral values of p and q (including in 
particular p=q= 3 for which a,,. ,,.=()). To begin with, he inserts 
half-integral values for g into (13) to interpolate between the elements of 
the pth row (p integral) of Table 2. For example, 


Qo 172 = 3(3+ 1)(G+2) = %. 
43,572 = ($+ (5+ 2)(§+3) = 


By diagonal symmetry, this at the same time inserts values a, 
(p half-integral) between the elements of the gth column (gq integral) "of 
Table 2. The result of this interpolation of values of a, , for either p or 
q (but not both) half-integral is the expanded Table 3 below, in which the 
interpolated values are printed in boldface. We have also inserted the 
unknown value 4, /2 1/2=C. 

What remained for Wallis at this point was the crucial step of “filling in 
the blanks” in Table 3. To simplify the description of this final interpola- 
tion let us write 


m = 2p, n= 24, Bm n = p,q = Un/2,n/2 


Table 3. a, ,=f(D, 9) = boy, 24 


n 0 l Z 3 4 5 6 
q 

m p 0 ; I 3 2 $ 3 
0 0 ] 1 ] 1 ] 1 | 
1} 1 Oo 3 e s 
- tf t & B@ & | F 4 
3 bo 5 "s 
4 2 I ” 3 3 6 $s 10 
5 $1 ¢ @ 
6 3 1 SF 4 3 10 #& 20 
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If m and n are even integers, then it follows from (14) that 


Bran a Am /2, n/2 
_ mf/2+n/2 
~ ~ nf2 om /2, (n/2)-1 
Ban am re Pon n=? (15) 


However, Wallis noted that Equation (15) is also satisfied by those 
elements a,,, for m or n odd that were inserted in Table 3 in the previous 
step. For example, from b,,=2 we obtain 


and from b,, = we obtain 


EXERCISE 8. Use elementary properties of the gamma function and the fact that 


a aes T(p+qtl) 
PF T(p t+ DI (¢g+)) 


to establish (15) for all integers m > 0, n > 2. 


Wallis then used (15) to fill in the remaining elements in the row m= 1 
(and by symmetry the column = 1) in terms of [. For example, 


63 = +b = $O, 
and 


bs = $5 = 50. 


Finally, he filled in the remaining blanks in Table 3 by using the “fact” 
that 


Ds = Bn, n—2 a ae, ee (16) 
For m and 7 even, Equation (16) is just the familiar law of formation, 
Qn g =, g-1+4,-1,q Of Table 2 as Pascal’s triangle; Wallis simply 
assumed “by analogy” that (16) holds also when m or n or both are odd. 


For example, having already computed b,,= b;, = <C, (16) gives 


Again, 
b35 = b33 + by; 


=$0+20 = 2b etc. 
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EXERCISE 9. Use elementary properties of the gamma function and the fact that 


Pi che (p+qt1) 
Pl T(p + IT (¢g+1) 


to establish (16) for all integers m,n >2. 


Now the final computation of []= 6, , comes from the completed row 
for m=1: 


3 
© 
— 
Niw | bO 
Ww 
> 
F 
n 


n+l 
b , = n by 2 
it follows easily by induction that 
3. 5 n+ 1 

by, = IX ZXGX+* x (17) 

if n is even, while 
ead ee n+l, 

bn => X7TXZX ae ‘ (18) 

if n is odd. 
In addition it is clear from the definition 
l 


2 ale rae 
f=)? d& 
0 


that the sequence is monotone increasing, 
bi <b, 2 <b, 3 < as, a5 <b, < Bi nat < eS 
If we substitute (17) and (18) into 
Bi on-1 < Bi on < Bi on+1 


the result is 


On _2k 4 2k+1 O77 2k 
2 i 2k—1 i 2k <2 i 2k—1’ 
so rearrangement gives 
(2k)? 2 il (2k) 2n+2 


cai 2k—DQk+1) ~ O ~| ety @k—1(2K41) | 2n41' 
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Since 2/[T] = 7/2 and (2n + 2)/(2n + 1) approaches one as n—>00, it finally 
follows that 


63 “ (2k)° 

a eee at (2k — 1)(2k +1) 
ee ee 
1 3 3 =«5 


as desired. 

Thus Wallis derived his famous infinite product on the basis of several 
unproved assumptions which, as we have indicated in the exercises, can be 
substantiated using elementary properties of the gamma function. This 
reliance upon reasoning “by analogy” did not escape the criticism of 
Wallis’ contemporaries. Thomas Hobbes (1588-1679) attacked the 
Arithmetica infinitorum as “a scab of symbols,” and objected to “the whole 
herd of them who apply their algebra to geometry.” Fermat more specifi- 
cally criticized Wallis’ use of “incomplete” rather than “complete” (i.e. 
ordinary mathematical) induction. In his Algebra of 1685, Wallis replied 
that his purpose “was not so much to show a method Demonstrating 
things already known as to show a way of Investigation or finding out of 
things yet unknown. Thus I look upon [incomplete] induction [or analogy] 
as a very good Method of Investigation; as that which doth very often lead 
us to the easy discovery of a General Rule; or is, at least, a good 
preparative to such an one” (quoted by Nunn in [6], p. 385). Thus he 
alluded to the heuristic processes by which mathematical results are often 
discovered, prior to any attempt at mgorous proof. 

For a translation of a pertinent part of the Arithmetica infinitorum, see 
Struik’s source book ([{7], pp. 244-253). Nunn [6] gives an English para- 
phrase that we have made use of. See also Whiteside’s article ((8], 
pp. 236-241) for an outline of Wallis’ interpolation method. 


Quadrature of the Cissoid 


In his Tractatus duo de cycloide (1659) Wallis applied his method of 
“interpolation by analogy” to the quadrature of the cissoid. The following 
set of exercises outlines his method (as described by Whiteside [8], pp. 
242-243) and will provide the reader with first-hand practice. 


EXERCISE 10. The cissoid, associated with the circle y* = x(1— x) of diameter 1, is 
defined as the locus of the point B such that BL/ OL = OL/ KL (Fig. 1). Show that 
its equation in rectangular coordinates is 


y= X71 — x) x € (0, 1). 


Obviously B—>O as x->0, and the line x = 1 is an asymptote. Hence the area under 
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Figure 1 


the cissoid is 
A = ['x9(1—x)7'? de 
") 


provided that this improper integral converges. 


EXERCISE 11. Let 


An = ['x7(1— xy"? dx = ['x™(1— x)? dx 
0 0 


and note that a,;=7/8, the area of the semi-circle of diameter one. Calculate 
directly the values 


= 2 = 2 e 2 — —- &¢ — © m— —_——_—- * = © — © = 
ag = 35> a2 —-3° 3> a4 7-3 °5° 7 469-3 °5°7° > 


from which it appears that 


for m even. Assuming that this recursion relation holds for m odd as well, it follows 
that 


EXERCISE 12. Now let 
b, = fie?a-x"” dx 
1) 


and note that b__, is the desired area under the cissoid. Calculate directly the values 


=— 2 eee 2 — —- + = + = 
bo=3> b2=5°5, by= 
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from which it appears 


b, = b,-2 


ae eee 
n+5 
for n even. Assuming that this recursion relation holds for n odd as well, it follows 
that 


b_, = 6b, = 6a, = =, 


so the area of the cissoid is three times the area of the generating semi-circle. 


EXERCISE 13. Rigorously establish the results of the preceding exercise by noting 
that 


b, 


5 on _— 16/2) (n/2+ 1) 
B(5. 5+1) ~ ~T(n/2+7/2) 
n T(5/2)0(n/2) 


n+5 T(n+5/2) 


The Discovery of the Binomial Series 


The formulation of the binomial series in 1665, as a result of his reading of 
the Arithmetica infinitorum, was Newton’s first mathematical discovery of 
lasting significance. He did not formally publish it, but described it in the 
two famous letters of 1676 that he sent to Henry Oldenburg, secretary of 
the Royal Society of London, for transmission to Leibniz. In the epistola 
prior dated June 3, 1676 (see Chapter 8 for references to Newton’s 
correspondence) he states that 


Extractions of roots are much shortened by this theorem, 


m/n _ pm/n4 m m—n m—2n 
(P+ PQ) P +—- AQ + Da BQ + 37 CQ 
m—3n 
+ rs DQ + etc. (19) 


where P + PQ signifies the quantity whose root or even any power, or the 
root of a power, is to be found; P signifies the first term of that quantity, 
Q the remaining terms divided by the first, and m/n the numerical index 
of the power of P+ PQ, whether that power is integral or (so to speak) 
fractional, whether positive or negative. 


Each of the symbols A, B, C,..., denotes the immediately preceding 
term; that is, A= P”/", B=(m/n)AQ, etc. 


EXERCISE 14. Show that formula (19) is equivalent to the binomial series in the 
more familiar form 


m/n _ pm/n < m/n k 
(P+ PQ) P f+ ( : Ie } 
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where the binomial coefficients are defined by 
m/n\ _ 1em\(m_ iy... | (m_ 
( k } ala )(4 1) (G k +1). 


The binomial coefficients for positive integral powers had probably been 
known for hundreds of years. It is of independent significance that, in 
stating the generalization for positive or negative rational powers, Newton 
introduces for the first time the use of negative or fractional exponents for 
the purposes of routine algebraic computation. As we have seen, Wallis 
wrote ”1/a’, whose index is —2” and ”V/a , whose index is .", but never 
actually employed negative or fractional exponents. But Newton follows 
the above statement of the binomial series with the innocuously phrased 
remark that: 


For as analysts, instead of aa, aaa, etc., are accustomed to write a’, a°, 


' : 3 : 
etc., so instead of Va, Va?, Vc: a> (ie. Va’), etc. I write a!/2, 


a?/2, a°/>, and instead of 1/a, 1/aa, 1/a*, I write a~!, a~*, a~?, 


The statement without proof of the binomial series in the epistola prior is 
followed by nine illustrative examples, including the following. 


2 4 6 
OP)? meee Sap 
( 2¢ 8c? 16? 
5x° Ie 
= + + etc. 
128c’ 256c? 
1/3 2. 3 
4/3 _ 44/3 , 4ed 2e~ Ae 
(d+e) d saa ae +S os 107 + etc. 
I 1 ee e 


dte dp @ @ 


(d+ e)? 


| 
| 


In return to Leibniz’ request for information concerning the origin of the 
binomial series, Newton outlined in the epistola posterior of October 24, 
1676 the steps by which he had been led to its discovery. The binomial 
formula for positive integral powers had not previously been known in a 
form that permitted the simple replacement of an integral exponent by a 
fractional one. Instead its discovery was based upon a complicated in- 
vestigation using Wallis’ method of tabular interpolation. As Newton tells 
it (in translation; see Newton’s Correspondence, Vol. II, p. 130): 


At the beginning of my mathematical studies, when I had met with the 
works of our celebrated Wallis, on considering the series by the intercala- 
tion of which he himself exhibits the area of the circle and the hyperbola, 
the fact that, in the series of curves whose common base or axis is x and 
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the ordinates (1 — x7)°/?, (1 — x?)!/2, (1 — x?)?/2, (1— x?p>/2, (1 — x?)4/2, 
(1 — x?)°/, etc., if the areas of every other of them, namely x, x— 1x°, 


x—2xP+ix®, x—3x3+2x°-—1x", etc. could be interpolated, we 
should have the areas of the intermediate ones, of which the first 
(1 — x?)!/2 is the circle: in order to interpolate these series I noted that in 
all of them the first term was x and that the second terms (0/3)x°, 
(1/3)x?, (2/3)x?, (3/3)x3, etc., were in arithmetical progression, and 
hence that the first two terms of the series to be intercalated ought to be 


Lf 3 __ 1/3 ,.3 2 Lf S-453 
x —3(4x ), x $(3x ), x 4(3x ), etc. 


To intercalate the rest I began to reflect that the denominators 1, 3, 5, 7, 
etc. were in arithmetical progression, so that the numerical coefficients of 
the numerators only were still in need of investigation. But in the 
alternately given areas these were the figures [1.e., digits] of powers of the 
number 11, namely of these 11°, 11', 117, 113, 114, that is, first 1; then 
1, 1; thirdly 1, 2, 1; fourthly 1, 3, 3, 1; fifthly 1, 4, 6, 4, 1, etc. And so I 
began to inquire how the remaining figures in these series could be 
derived from the first two given figures... . 


Thus Newton considered the sequence of functions 
f(x) =f -0)"? ae, 
0 
whereas Wallis had only considered the sequence of numbers 
1 
a,={ (1-—?)"” dt. 
fa-? 


When 7 is even f,(x) can be evaluated explicitly, since he knows from 
Wallis that 


] 


f apa 
0 pt 
Thus 

fo(x) = 1(x), 

f,(x) = I(x) + I(— 4x?), 

fx) = U(x) + 2(—3x°) + 165%’), 

fx) = 13) +3(— 422) #3(5x°) + (42?) ete. 


displaying the integral binomial coefficients as “figures of powers of the 
number 11.” Newton wants to interpolate, for n odd, the coefficients of the 
infinite series 


Sx) = > dm| (—D ae 


In particular, when n = 1 this will give the area of a segment of the circle. 
The details of his interpolation procedure are given in an unpublished 
manuscript composed in 1665 (see pages 126-134 of the first volume of 
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Table 4 
n 
m 0 ] 2 3 4 5 6 10 
0 ] ] ] ] ] ] | ] 
1 o ! Ff 2 2 £ 3 5 
2 0 - 0 * ] - 3 10 
3 0 * 0 - 0 - l 10 
4 0 * 0 - 0 : 0 5 
5 0 bg 0 * 0 * 0 ] 


Newton’s Mathematical Papers, referenced in Chapter 8). He first observes, 
as in the above quotation, that when n is even the first two coefficients are 


n 
5? 
and he assumes this to be true for n odd as well. The problem is to fill in 
the missing entries (asterisks) in the above tabular display of the 
coefficients a,,,. 

He notices that, because of the familiar law of formation of the integral 
binomial coefficients, 


A, = 1 and a,,= 


an n+2 = Ani, n + ann (20) 


in the present notation, the successive rows of the subtabulation, that 
consists only of the known columns (those for n even), are of the forms 
listed in Table 5. 

He then assumes, in the fashion of Wallis, that the individual rows of the 
full tabulation (in Table 4) are of the same forms as in Table 5, but with 
the constants a, b, c,... to be determined separately for each row. It may 
be noted that this would follow from the assumption that, for each m, a, 
is a polynomial in n of degree m. 

FEquating the values given in Table 4 for a,,, with n even, with the literal 
expressions given in the row m=2 of Table 5, we obtain the equations 


O=c 

O=art+2b+c 

1 =6a+4b+C 

Table 5 
n 

m 0 | 2 3 4 5 6 
O a a a a a a a 
1 b at+b 2a+b 3a+b 4a+b 5at+b 6a+b 
2 c b+c at2b+c 3a+3b+¢ 6a+4b+e 10a+Sb+c 15Sa+6b+¢c 


3 d ct+td b+2c+d at+3b+3c+d 44a+66+4c+d 10a+106+5c+d 200+ 155+6c+d 
a a a 
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which are readily solved for a=4, b= — 3, c=0. Therefore 
a= b+c= —§, A, = 3at+3b+c=2, etc. 


Similarly, equating the known values of a,, for n even with the above 
literal expressions for m = 3, we obtain the equations 
O=d 
0=b+2c+d 
0=4a+6b+4c+d 
1 = 20a + 155+6c+d 


which are readily solved for a=;, b= — 1, c= 4, d=0. Therefore 


a,,=c+d=4, a, =at+3b+3c+d=—-—4 


1. ete. 


EXERCISE 15. Write down enough terms in the row for m=4, beginning with 
eedt+e,ct2d+eb+3c+3dt+e,..., 


to obtain in the above way 5 equations in the unknowns a, 5, c, d, e. Solve these 
equations and substitute to obtain 


5 eae 
128 and a3 = 128° 


The values of a,,, obtained thus far give 


i 1 at. 
Ax) = [°-2)'? = x—- 7 -  -  - — tete. 


Newton actually continued in this manner to calculate the two addi- 
tional coefficients a,, = 7/256 of —x''/11 and a,, = —21/1024 of x3/13. 
At this point he could discern the general form of the coefficients, namely 
that they arise from continued multiplication of terms of the product 


x 


Ai GE)? Ge, 


2 2 3 
That is, 
1 1\_G)-1 
a el) 2 
i fi). Gal (2 
767 (3)* oC 3 
5... Nha) =e 3 
~ Tog = 3 * >) x 3 x 4 etc., 
Or 


1G= G2) Ge key) 


(1) ea, 


in modern notation. 
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Thus it was by means of Wallisian quadrature by interpolation that 
Newton discovered the general binomial coefficient! But, as he continues 
in the epistola posterior, 


When I had learnt this, I immediately began to consider that the terms 


(l—x?)/?, (1 — x2)? (1 — x2)*7,, (1 — x2), ete, 


could be interpolated in the same way as the areas generated by them: 
and that nothing else was required for this purpose but to omit the 
denominators 1, 3, 5, 7, etc., which are in the terms expressing the areas; 
this means that the coefficients of the terms of the quantity to be 
intercalated (1 — x2)'/2, or (1 — x)?/2, or in general (1 — x)”, arise by the 
continued multiplication of the terms of this series 

m-1.  m-2_ m-—3 


X——s Xe ~—COCCtC. 


ee 3 4 


That is, termwise differentiation of his quadrature result gives 


ore) k(m 5 
SOV a. _ k 
(1 — x*) on 1)*(77 )x ; 
where 

m) _ mim I) (meet 
(7 ~ k! 

As examples he lists 

(1—x2)'?7 = 1 —4x2?-1xt@- bey ..., 


(l—x?y/* = 12x27 43xte i xo. 
and 


(1—x?)'? = oe oes oe 


| 
| 
] 
be 


‘So then the general reduction of radicals into infinite series by that rule, 
which I laid down at the beginning of [the epistola prior] became known to 
me, and that before I was acquainted with the extraction of roots.” 

Of course Newton was aware that his interpolatory investigation did not 
constitute a proof. In order to test his results, he squared the series for 
(1—x?)'/?, “and it became 1 — x’, the remaining terms vanishing by the 
continuation of the series to infinity. And even so 1 —(1/3)x?—(1/9)x*— 
(5/81)x®, etc., multiplied twice into itself also produced 1 — x?.” 


EXERCISE 16. Square the quantity 
P(x) = 1 —ix?-—$x4-—4x° — 3 x® + R(x), 
where x!° is the lowest power occurring in R(x), to obtain 


[P(x)}’ = 1— x? + Q(x). 


What is the lowest power of x occurring in Q(x)? 


This verification of the binomial expansion motivated Newton to “try 
whether, conversely, these series, which it thus affirmed to be roots of the 
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quantity 1—x*, might not be extracted out of it in an arithmetical 


manner.” As indicated by his notes on his early studies of contemporary 
mathematical texts, he was familiar with the following method of Viéte for 
the extraction of the square root of a number N. 

Given an estimate A of VN , denote by E the error, so VN =A+E. 
Then 

N — A? = (A+E) — A? = 2AE + E?* = 2AE, 

SO 

N- A? 

aay 

if A is a sufficiently good estimate of VN that E? is small compared with 


2AE. This provides the basis for the computation of successive approxima- 
tions to VAN , as follows. Starting with an initial estimate x, of VN , define 


N-—x? 
Xn+1 = X, + aes x, +e» (21) 
n 
where 
N-x? 
e — 
2 2X, 


Note that N—x?,,=(N-—x’)—2x,e,—e?, so the numerator of e,,, is 
obtained by simple subtractions from the numerator of e,. 


EXERCISE 17. Suppose that the sequence {x,};° defined inductively by (21) con- 

verges to x, +0. Then take the limit as n—oo in Equation (21) to show that 
2 

x,=N. 


Prior to the onslaught of the so-called “new math,” Viéte’s method was 
frequently taught to school children in a rote form illustrated by the 
following computation. 


N = 54,756 x, = 200 
xt = 40,000 e, = 10/ #755] = 30 
10(400) 
N-—x2 = 14,756 x, = 230 
2x,e, = 12000 e =[4g]=4 
eg = 900 x, = 234 
N-x2 = 1856 
2x,e, = 1840 
e = 16 
N-xi = 0 


Thus V 54,756 = 234. 
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EXERCISE 18. Calculate the square root of 461,041 by Viéte’s method, as above. 


In order to check his binomial expansion of (1 — x*)'/*, Newton applied 
a literal version of Viéte’s method. Starting with y,=1, he iteratively 
calculated 


Yn+1 = Vn ae Cn» 
where e, is the term of lowest degree in 
(l=x)=9; 
9) 3 
noting that 
(1 = x”) -s Yati = (1 aia x? = Yn) — 2Ynen oe en 


for the purpose of the successive computations. As he continues in the 
epistola posterior, “the matter turned out well. This was the form of the 
working in square roots.” 


1—x? (1 —3x7-ix*-Zx® ete 
] 
— x2 
—x*+ix‘4 
—_1l,4 
ax 
ee ree eee 
gx bax Tax 
—1,6_1 ,8 
aX ~ 64x 


EXERCISE 19. Work through the above calculation, as well as the following 
extraction of V 1+ x? , identifying y, and e, at each stage of the computations. 


x2 xt x6 5x8 7x9 D1! 


2 OO ee BA, 2 hee ee Oi ate ae Sh ae 
i (+> —-9 +76 — jas * 356 ~ T0024 


1 


2 


x 
x?+x4/4 
—x4/4 
—x*/4—x°/8 + x°/64 
x°/8— x®/64 


x°/8 + x®/16 — x!9/64 + x!*/256 
— 5x°/64 + x!9/64 — x!2/256. 


Continue through two more steps to obtain the terms in x!° and x!” listed above. 
Also verify that this is the same result that the binomial series gives. 
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Newton discovered the process of algebraic “long division” of polynomi- 
als in a similar way. His binomial series with m= — 1 gave the geometric 
series 

] 


—— =1]1-—x+x?-x34+x1--.-, 
1+x 


and he subsequently noted that the same result is obtained “by operating 
in general variables in the same way as arithmeticians in decimal numbers 
divide.” In analogy with the common process for numerical long division, 
he sets out the computation in the following way. 


1+ x)l (l—-x+x7—x?--- 
1l+x 
—x 
—x-—x? 
2 
x*+ x3 
— x? 
— x3— x4 
x4. 


Actually, Newton had first discovered the geometric series in his early 
binomial work when he investigated by Wallis’ method of tabular inter- 


polation the area 
x dt 
a = 1+? 


under the hyperbola y = 1 /(1 + x) over the interval [0, x]. 


EXERCISE 20. Calculate the coefficients a,,,, in 
00 m 
foarte a= > a 
0 m=) wn 


for n=0, 1, 2, 3, 4, so as to verify the entries in Table 6. Note the familiar Pascal 
triangle relation 


Qn—1,n + am, n = Qn, n+l 


for n > 0. Assuming that this relation holds for n= — 1 as well, and that a, _,=1, 
conclude that 
Qn, -1 = (> aaa 
Consequently the result of the “interpolation” is 
x dt ane, ee 
keg oe ge ae 
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Table 6 
n 
m —] 0 ] 2 3 4 
] ] ] ] 1 ] 
2 0 ] 2 3 4 
3 0 0 1 3 6 
4 0 0 ) ] 4 
5 0 0 0 0 1 
6 0 0 0 0 0 


“Believe it or not,” the computation indicated in Exercise 20 was 
Newton’s original derivation of Mercator’s series! He then noticed that this 
is likewise the result of termwise integration of the geometric series. 

Obviously, the most extraordinary feature of Newton’s remarkable bi- 
nomial investigations is the sequence of invention. He began with the 
quadrature of circular and hyperbolic segments by Wallis’ method of 
tabular interpolation. Next, by termwise differentiation of the results of 
these quadratures, he discovered the binomial series. Finally, the need to 
verify the binomial series led him to the algebraic versions of the familiar 
numerical processes of long division and root extraction. 

The final result of this sequence of investigations—the application to 
infinite series of the familiar procedures of simple arithmetic—was of 
greater importance than any single example such as the binomial series. As 
Boyer puts it, Newton “had found that analysis by infinite series had the 
Same inner consistency, and was subject to the same general laws, as the 
algebra of finite quantities. Infinite series were no longer to be regarded as 
approximating devices only; they were alternative forms of the functions 
they represented” ({2], p. 432). As Newton himself somewhat optimistically 
phrased it, “whatever common analysis [i.e., ordinary algebra] performs by 
equations made up of a finite number of terms [i.e., polynomials] 
(whenever it may be possible), this method may always perform by infinite 
equations [i.e., infinite series].” (See page 241 of Volume II of Newton’s 
Mathematical Papers). 

Thus was banished forever the “horror of the infinite” that had impeded 
the Greeks, and was set loose the torrent of infinite series expansions that 
were to play a central role in the development and applications of the new 
calculus. 
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The Calculus According to Newton 


The Discovery of the Calculus 


When we say that the calculus was discovered by Newton and Leibniz in 
the late seventeenth century, we do not mean simply that effective methods 
were then discovered for the solution of problems involving tangents and 
quadratures. For, as we have seen in preceding chapters, such problems 
had been studied with some success since antiquity, and with conspicuous 
success during the half century preceding the time of Newton and Leibniz. 

The previous solutions of tangent and area problems invariably involved 
the application of special methods to particular problems. As successful as 
were, for example, the different tangent methods of Fermat and Roberval, 
neither developed them into general algorithmic procedures. Between these 
special techniques for the solution of individual problems, and the general 
methods of the calculus for the solution of whole classes of related 
problems, we today may see only a moderate gap, but it was one that 
Fermat and Roberval and their early seventeenth century contemporaries 
saw no reason to attempt to bridge. 

What is involved here is the difference between the mere discovery of an 
important fact, and the recognition that it is important—that is, that it 
provides the basis for further progress. In mathematics, the recognition of 
the significance of a concept ordinarily involves its embodiment in new 
terminology or notation that facilitates its application in further investiga- 
tions. As Hadamard remarks, “the creation of a word or a notation for a 
class of ideas may be, and often is, a scientific fact of very great impor- 
tance, because it means connecting these ideas together in our subsequent 
thought” ((2], p. 38). 

For example, we have seen that Fermat constructed the difference 
f(A + E)—f(A), noted that (for the polynomial functions he dealt with) it 
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contained £ as a factor, divided by £, and finally cancelled every term still 
containing E as a factor, thereby obtaining the quantity 


f(A + E)—f(A) 


E E=0 


Of course we now call this quantity the derivative, and denote it by f’(A). 
But Fermat did not call it anything, nor introduce any particular notation 
for it. If he had, the way would have been open for general applications, 
and he might have been (as he has been erroneously called) at least a 
co-discoverer of the differential calculus. 

Perhaps the most clear-cut example in the history of calculus, between 
discovery and the recognition of significance, is provided by the “funda- 
mental theorem of calculus,” which explicitly states the inverse relationship 
between tangent and area problems (or, in modern terminology, between 
differentiation and integration). This relationship was implicit Gf not 
conspicuous) in the results of the early seventeenth century area computa- 
tions that we have discussed in preceding chapters—e.g., the area under 
the curve y=x” over the interval [0, x] is x”*'!/n+1, while the slope of 
the tangent line to the curve y=x"t!/n+1 is x”. Indeed, Barrow stated 
and proved (as we have seen in Chapter 5) a geometric theorem that 
clearly enunciated the inverse relationship between tangents and quadra- 
tures. However, he failed to recognize that his “fundamental theorem” 
provided the basis for “a new subject characterized by a distinctive method 
of procedure” (Boyer [1], p. 187). The contribution of Newton and Leibniz, 
for which they are properly credited as the discoverers of the calculus, was 
not merely that they recognized the “fundamental theorem of calculus” as 
a mathematical fact, but that they employed it to distill from the rich 
amalgam of earlier infinitesimal techniques a powerful algorithmic instru- 
ment for systematic calculation. 


Isaac Newton (1642-1727) 


Newton was born on Christmas Day in 1642. Nothing that 1s known about 
his youth and early education heralded the fact that his life and work 
would mark a new age in the intellectual history of mankind. He entered 
Cambridge in the summer of 1661 and received his B.A. early in 1665. 
Upon Barrow’s retirement as Lucasian Professor in 1669, Newton was 
elected as his successor, and remained at Cambridge until 1696, when he 
left for London to serve as Warden of the Mint. Upon his death in 1727 he 
was buried in Westminster Abbey with such pomp that Voltaire remarked, 
“T have seen a professor of mathematics, only because he was great in his 
vocation, buried like a king who had done good to his subjects.” 
Apparently Newton did not begin his serious study of mathema- 
tics—beginning with Euclid’s Elements and Descartes’ Geometrie—until 
the summer of 1664. During the two years of 1665 and 1666 when 
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Cambridge closed because of the plague, he returned to his country home 
in Lincolnshire, and there laid the foundations for the three towering 
achievements of his scientific career—the calculus, the nature of light, and 
the theory of gravitation. Of this biennium mirabilissimum he later wrote 
that “in those days I was in the prime of my age of invention and minded 
mathematics and philosophy more than at any time since.” 

Newton’s Principia Mathematica of 1687 and Opticks of 1704 detailed his 
contributions to mechanics and optics. However, his contributions to pure 
mathematics (including the calculus) remained largely unpublished during 
his lifetime. Mathematical discoveries at that time were not usually 
announced by means of prompt journal publication, because journals 
devoted to mathematics did not yet exist, but were often communicated in 
the form of personal letters and privately circulated manuscripts (and even 
sometimes proposed as riddles). At his death Newton left behind a mass of 
approximately 5000 sheets ({6], p. 70) of unpublished mathematical 
manuscripts, some of which had circulated amongst his contemporaries or 
served as a basis for his infrequent mathematical publications. This huge 
corpus of unpublished manuscript defied efforts directed towards its sys- 
tematic organization for almost three centuries, until the appearance (in 
eight volumes from 1967) of the monumental Cambridge edition of The 
Mathematical Papers of Isaac Newton, edited by D. T. Whiteside. 
Throughout this chapter this edition is referenced as [NP]; for example, 
[NP III:2] refers to Newton’s 1671 treatise on methods of series and 
fluxions in Part 2 of Volume III. Similarly, (INC II], pp. 32-41) refers to 
Newton’s first letter to Leibniz, in the second volume of The Correspon- 
dence of Isaac Newton, of which seven volumes have appeared since 1959. 


The Introduction of Fluxions 


In October of 1666 Newton gathered together and organized the results of 
his calculus research during the previous two years into a manuscript later 
referred to as “The October 1666 Tract on Fluxions” ({NP I], pp. 400-448). 
This was the first of his formal papers on the calculus. Although unpub- 
lished until recently, apparently copies of the manuscript were seen by a 
few English mathematicians during Newton’s lifetime and after his death. 

Beginning in late 1665, Newton had studied the tangent problem by the 
method of combining the velocity components of a moving point in a 
suitable coordinate system. This approach was (as we have seen in Chapter 
5) previously developed by Roberval, but this earlier work was probably 
unknown to Newton. This investigation of tangents by means of compo- 
nent motions provided both the motivation for the new method of flux- 
ions, and the key to its geometric applications. 

Newton regarded the curve f(x, y) =0 as the locus of the intersection of 
two moving lines, one vertical and the other horizontal. The x and y 
coordinates of the moving point are then functions of the time ¢, specifying 
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the locations of the vertical and horizontal lines, respectively. The motion 
is then the composition of a horizontal motion with velocity vector having 
length x and a vertical motion with velocity vector having length y. By the 
parallelogram law for the addition of velocity vectors (then well-known in 
the case of constant velocity vectors, and here applied to instantaneous 
velocity vectors), the tangent velocity vector is the parallelogram sum of 
these horizontal and vertical vectors. It follows that the slope of the 
tangent line to the curve is y /X (Fig. 1). 


f(x, y) = 9 


Figure 1 


On this basis Newton considers the geometrical model of two (or more) 
points A and B traveling distances x and y along different straight lines in 
equal periods of time, such that f(x, y)=0 at all times, with speeds x and 
y, respectively, at a given time (Fig. 2). 

He does not attempt to define the “fluxional speeds” of the points A and 
B, the fluxions x and y of x and y, with which the two points move with 
the “flux” (or flowing) of time. Instead, the concept of the speed of a point 
moving along a straight line is regarded as intuitively apparent on physical 
grounds. In modern terms, the fluxions x and y are simply the derivatives 
of x and y with respect to ¢, 


X dy 
x =— and —., 
dt Yat 
and their ratio is the derivative of y with respect to x, 
y_ Wp 
x dx” 


It is with some irony that we use the differential notation of Leibniz to 
relate Newton’s work in contemporary terms. It may also be noted that, in 
his early work, Newton generally used other letters, such as p and q instead 
of x and y, for the fluxions of x and y. The dot notation, now regarded as 
characteristically Newtonian, was not consistently adopted by him until 
the early 1690s. 


Figure 2 
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Newton’s first problem is that of finding the relationship between the 
fluxions x and y, given the relationship f(x, y)=0 between x and y. For 
the case f(x, y) = a,x’, a polynomial, he provides the following solution 
((NP I], p. 402). 


Set all y© termes on one side of y° equation that they become equal to 
nothing. And first multiply each terme by so many times x/x as x hath 
dimensions in that terme. Secondly multiply each terme by so many times 
y/y as y hath dimensions in it... . the summe of all these products shall 
bee equall to nothing. W“" Equation gives y* relation of y* velocitys. 


In other words, if f(x, y) = 2a,,x'y’ =0, his solution is 


ix jy ae 
> (S+2 Jay = 0. (1) 
EXERCISE 1. Show that (1) is equivalent to 
of. . of 
c—+y— =0 
Ox? ay 
in terms of modern partial derivatives. Hence 
Y _ _ of/ax 
x af / dy 


In proof of (1), Newton first observes that if two bodies move with 
uniform (constant) velocities, then the distances traversed are proportional 
to their velocities. He continues, “And though they move not uniformly yet 
are y® infinitely little lines w" each moment they describe, as their 
velocities w“ they have while they describe y™.” His idea is that, during an 
“infinitely short” time interval o, the situation is the same as that during a 
finite time interval for the case of uniform motion—arbitrary motion is 
essentially uniform during an infinitely short time interval. “Soe y'‘ if y* 
described lines bee x and y, in one moment, they will bee x+xo and 
y + yo in y* next.” 

Illustrating the procedure by example ({NP I], p. 414), he therefore 
substitutes x + xo for x and y + yo for y in the equation f(x, y)=0, 


> a,(x + xo)'(y + yoy = 0. 


Binomial expansion then gives 
> ax + > a,x'(jy’~ Yo + terms in 07) 


+ > a,y/(ix'~ !xo + terms in 07) 


+ Da,ix''xo+ ---)(~-Yot---)=0. 
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Applying the fact that Da,x}/=0, and dropping all terms involving 0’, 
the result is 


> 4,(ix'~ 'y/xo + jx'y7— Yo) = 0. 


Division by o then gives (1) as desired. In justification of this procedure, 
Newton makes the following observations. 


First y' those termes ever vanish w“ are not multiplyed by 0, they being 
y° propounded equation. Secondly those termes also vanish in w™ o is of 
more y” one dimension, because they are infinitely lesse y" those in w" 0 
is but of one dimension. Thirdly y* still remaining termes, being divided 
by o will have [the desired form]. 


Thus, on the basis of plausible consequences of an intuitive physical 
conception of fluxions as instantaneous velocity components, he is able to 
compute (from (1) as in Exercise 1) the slope y/x of the tangent line to an 
algebraic curve. 


EXERCISE 2. Write y = x” in the form f(x, y)=y — x” =0, and conclude from (1) 
that 


The Fundamental Theorem of Calculus 
Having calculated dy /dx =y /x from the polynomial equation f(x, y) =0, 
Newton poses the converse problem: To find y in terms of x, given an 


equation expressing the relationship between x and the ratio y/x of their 
fluxions. In case this equation is of the simple form 


x = 9(x) 


Figure 3 
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this is simply the problem of what we now call antidifferentiation, while 
the general case g(x, y/x)=0 is a differential equation. An indication of 
Newton’s insight (as early as 1666) is his remark that “could this ever bee 
done all problems whatever might bee resolved” ((NP I], p. 403). 

In the fifth and seventh of his list of illustrative problems in the October 
1666 tract, Newton discusses the computation of areas by means of 
antidifferentiation. This is the first historical appearance of the fundamen- 
tal theorem of calculus in the explicit form 


dA 


ax 


where A denotes the area under the curve y = f(x), providing the basis for 
an algorithmic approach to the computation of areas. As we have seen, 
areas and tangents had been extensively calculated by ad hoc techniques 
throughout the early seventeenth century; it was the introduction and 
exploitation of general algorithmic techniques, by which these computa- 
tions could be systematized, that constituted Newton’s “discovery of the 
calculus.” 

Whereas previous infinitesimal techniques had been based, in principle, 
on the determination of an area as a limit of a sum (or, more crudely, as a 
sum of infinitesimal or indivisible elements of area), Newton introduced 
here the technique of first determining the rate of change of the desired 
area (with respect to x), and then calculating the area by antidifferentia- 
tion. In combination with his fluxional approach to tangents and rates of 
change, this made clear for the first time the precise nature of the inverse 
relationship between tangent problems and area problems, and the fact 
that both types of calculation are aspects of a single mathematical subject, 
one that is characterized by distinct and generally applicable algorithmic 
procedures. 


y= f(x)=q 


Figure 4 
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For Newton’s fluxional formulation and derivation of the fundamental 
theorem of calculus, let y denote the area abc under the curve g=f (x) 
(Fig. 4), and regard this area as being swept out by the vertical segment bc 
moving to the right with unit velocity x = 1. If p =1 (in the figure) then the 
area of the rectangle is x. Now, says Newton, “supposing y* line cbe by 
parallel motion to describe y* two [areas] x and y; The velocity w® w™ 
they increase will bee, as be to bc: y' is, y° motion by w™ x increaseth 
being be=p=1, y° motion by w™ y increaseth will be bc = q.” ({NP I], 
p. 427). Thus he takes it as obvious that the time rate of change of the area 
y is q=f (x), with x =1, so 


x = f(x). 


His crucial insight consisted in the observation of this fact, rather than in a 
rigorous proof in modern terms. No doubt, he thought in terms of an 
increase in the area from y to y + oq, corresponding to an increase from x 
to x + o during an “infinitely small” time interval o. 

As an immediate application, this explained the reciprocal relationship 
between the fact that the slope of the curve with ordinate x”*!/n+1 is x", 
and the fact that the area under the curve with ordinate x” is x"*!/n+1. 
For if 


x21 


ares (the area) 


y 


then the computational algorithm (1) gives 


y =X", 

x 
and conversely. Taking x = p=1, corresponding to a unit time rate of 
increase of x, and y = gq, as in Fig. 4, this is 


g=x" (the curve). 


It may be noted that Newton habitually ignored the “constant of integra- 
tion”, taking all of his curves to pass through the origin. 


The Chain Rule and Integration by Substitution 


The tangent and area problems emphasize the importance of systematic 
procedures for differentiation (the calculation of y/x, given f(x, y)=90) 
and antidifferentiation (the converse). Newton exploited the facility for 
differentiation and antidifferentiation by substitution methods—equivalent 
to what we call the chain rule and integration by substitution—that 1s 
essentially “built into” the calculus of fluxions. 


The Chain Rule and Integration by Substitution 197 


As an example of the “built in” chain rule, suppose we want to calculate 
y/x if 
y=(1+ xn p/? 
Newton would introduce a new variable z= 1+ x” with fluxion 
Z = nx" 'X. (2) 
Then y? = z°, so it follows that 


2yy = 3272. (3) 


EXERCISE 3. Apply the computational algorithm (1) to verify (2) and (3). But from 
(2) and (3) we conclude that 
¥ _ ¥/z _ 32? /2y 


X XfZ  1/nx"7! 


_ 3nx"""(1+x") _ 3 


= nx" !V1+x". 
aietx"y/? 2 


This is an illustration of the following general procedure that Newton 
employed (in specific examples) to differentiate 


y=[f(x)]"", 


where f(x) is a polynomial. First introduce the new variable z = f(x) with 
fluxion z = f’(x)x. Then y” =z”, so it follows that 
ny" y = mz") 
by application of (1). Finally 
y_ y/z mz™—" /ny"—! 
zt x/Z-—«1/f'(x) 
m f(x)[f(x)]" om 


RLF] a LFF), 


the familiar “power formula” result of elementary calculus. 

In a similar manner he is able to differentiate products and quotients, 
although he does not at this time formally state the product and quotient 
rules as explicit algorithms. Instead he illustrates by means of examples the 
following techniques. If 


y = f(x)ga(x), 


let u= f(x) and v= g(x) have fluxions u and 4, respectively. If f(x) and 
g(x) are polynomials, then the basic rule (1) for the computation of the 
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ratio of fluxions gives 

y = uv + uv, 

u= f'(x)x, 0 = g(x)X. (4) 
Hence 


2 - wie = f(x)g'(x) + f’(x)g(x). 


EXERCISE 4. Apply (1) to verify (4). 


If y=f(x)/g(x)=u/v, then yo =u so 


yot yo = u. (5) 
Hence 


_ fx) =~ 8 ODF) /8(4)] 
g(x) 
— J(x)a(x) ~ f(x) 8") 
[ g(x)]° 


EXERCISE 5. Apply (1) to verify (5). 


Newton describes this basic substitution process in the following lan- 
guage ({NP I], p. 411): 


Note y' if there happen to bee in any Equation either a fraction or surde 
quantity ... To find in what proportion the unknowne quantitys increase 
or decrease doe thus. 1. Take two letters y® one (as £) to signify y' 
quantity, y° other (as é) its motion of increase or decrease: And making 
an Equation betwixt y* letter € & y° quantity signifyed by it, find thereby 
[by (1)] y® valor of the other letter ~ 2. Then substituting y° letter é 
signifying y' quantity, into its place in y° maine Equation esteeme y' letter 
€ as an unknowne quantity & performe y° worke of [(1)]; & into y* 
resulting Equation instead of those letters ¢ & & substitute theire valors. 
And soe you have y* equation required. 


For an example similar to one of Newton’s, consider 


First let 
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EXERCISE 6. Apply (1) to obtain 


2a7tt —2xé2x —2x7kE = 2xx, 
sO 
t= x(1+&?)x 
(a= xg 


This is the first of Newton’s three steps. 
Now “y* maine Equation” is y* = é, so 


3 x(1+€7)x 
OT ee ch ee ba 
Fm Em i xe 


by Exercise 6. Hence 


y _ _x(1+é&’) 
X 2y(a?— x*)é 

x[ 1+ x?/(a?— x?) | 
7 2x1/2(q?- x2) /4(g?2 — x?) x(a? — x?)71/? 
= 4g 1/2(g? — x2) 5/4 


After several such examples, he concluded that “how to proceed in other 
cases (as when there are cube rootes, surde denominators, rootes within 


rootes (as Vax + Vaa— xx ) etc. in the equation may bee easily deduced 
from what hath been already said” ({[NP I], p. 413). 

He applied a similar substitution technique to construct a fairly exten- 
sive table of antiderivatives [NP I: pp. 405-410]. To start with, he showed 
that if 

n—-1 
Pa ce i: 
x atbx”’ pen = nab + nbz © (6) 
(Read “area under” for [], Newton’s usual integral notation). To see this, 
substitute z = bx”, so z= nbx"~/x as usual. Then 


7 | 
ee | La eee 
Zz Z/xX atbx 

cee 
nb at+z’ 


so that (6) follows from the fundamental theorem of calculus. 


EXERCISE 7. If y/x=cx"~'!Va+ bx" + cx , substitute z= x” in this manner to 
show that 


y= o£ Vatbz+cz*. 


n 
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EXERCISE 8. If y/x =(c/x)V ax" + bx?" , substitute z?= x", so 2zz=nx"~'X, to 
show that y=[] (2c/n)Va + bz? . 


In some cases, such substitutions lead to exact quadratures. For exam- 
ple, if 


a ae 
x Vat bx" 
then the substitution z = x” leads to 
Oe 
zon Va+ bz 


sO 


y = Vatb = Vat Bx" 


(as may be verified by straightforward differentiation). 

In cases such as Exercises 7 and 8, where an exact quadrature (in 
algebraic terms) seemed impossible, Newton’s general goal was to reduce 
the quadrature to the calculation of the area under the graph of a circular 
or hyperbolic function, such as 

a 0 eo en 
—— or XO , 
UO b+cx Uvex 
These areas he calculated by binomial expansion followed by termwise 
integration. For example, if 


Po 4 4 acx , actx” 
x b+cx b_ pp b3 ‘ 
then 
2 2.3 
aX  acx ac*x 
= — — + — ee, 7 
a ae ey ™) 
Similarly, if 
y 2 2 x? x4 
x = cae a 2a 8a? : 
then 
3 5 
x x 
= eae SS ee ta 
ee ba 40a° 8) 


Applications of Infinite Series 


In 1668 Mercator’s Logarithmotechnia appeared, containing his famous 
series for log (1 + x), obtained by long division of 1+ x into 1, followed by 
the equivalent of termwise integration of the resulting infinite series. 
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Spurred by this publication of an important result contained in his own 
unpublished work of several years earlier, Newton wrote up in early 
summer 1669 a brief compendium of his results under the title De Analysi 
per Aequationes Numero Terminorum Infinitas (On Analysis by Equations 
Unlimited in the Number of Their Terms) ((NP II], pp. 206-247). 
Although the De Analysi itself remained unpublished until 1711, and 
contained only a fraction of Newton’s 1664—1666 work, it was privately 
circulated and served to introduce him and his work to certain members of 
the English mathematical community. 

The De Analysi opens with a description, by means of rules stated 
without proof, of his general method (by application of the fundamental 
theorem) for computing the area under a curve y = f(x) (as Newton says, 
“rather briefly explained than narrowly demonstrated”’). 

The first rule recalls that if y= ax’™/" then the desired area is 


_ a m/n+1 Gy (mtn)/n 
(m/n)+1 m+n 


The second rule—“If the value of y 1s compounded of several terms of that 
kind the area also will be compounded of the areas which arise separately 
from each of those terms’—asserts the validity of termwise integration. 

The third rule states that “if the value of y or any of its terms be more 
compounded than the foregoing (that is, is not a polynomial), it must be 
reduced to simpler terms, by operating in general terms in the same way as 
arithmeticians in decimal numbers divide, extract roots or solve affected 
equations.” 

His examples “by division” and “by root extraction” are essentially the 
same as the computations by series of circular and hyperbolic areas 
mentioned at the end of the previous section (Equations (7) and (8)). 


Newton’s Method 


In preparation for the computation of areas by “the resolution of affected 
equations,” Newton introduces by example the technique for approximat- 
ing solutions of equations that is now known as “Newton’s method.” In 
order to “resolve” the equation 


y?-2&y—-5=0, (9) 


he starts with the approximation 2 to its root. Substitution of y =p +2 into 
(9) yields the equation 


p> +6p?+ 10p—1=0 


for p. Neglecting the nonlinear terms in p, he solves 10p — 1=0 for the 
approximation p = 0.1, so 2.1 is his second approximation to the root. He 
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then substitutes y = g + 2.1 into (9) and obtains the equation 
q? + 6.3q? + 11.23q + 0.061 = 0 


for q. Again neglecting the higher degree terms, he solves 11.23q + 0.061 = 
0 for g = — 0.0054. This yields his third approximation 2.0946 to the actual 
root (2.09455148 to 8 decimal places). 

This method for solving the polynomial equation 


k 
f(x) = > a,x' =0 (10) 
i=0 
may be described as follows. Given an approximation x, to the actual root 
X4, we substitute x, = x, +p into (10), obtaining 
k * 
0= > axi 
j=0 


i= 


k 


ae > a(x, +p)' 
i=0 


k 
= 2 a(x, + ix, Pts +) 


j= 
k k 

= i ° i-1 
i=0 i=Q 


0 = f(x) + Pf (x) +> 


where the dots indicate higher degree terms in p. Neglecting these higher 
degree terms, we obtain 


a fe) 
f'(%,) 
SO 
Xy =X, — oe = Xn+ (11) 


the familiar formula for the (n+1)st approximation using Newton’s 
method. 

It is interesting to note that Newton nowhere mentions the standard 
geometrical derivation of (11), that is, from the fact that the derivative 
f'(%,) is the slope of the hypotenuse of the right triangle in Figure 5. 

Newton next sketches by example a generalization of the above method 
for solving an equation of the form 


f(x,y) =0 
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Xn+1 


Figure 5 


x 


Figure 6 
for y as a power series in x. His example is the equation 


yit+aytaxy —x?-2a° =0, 


EXERCISE 9. Let a= 1 in this equation, so 


ytytxy—x?-2=0. (12) 


When x = 0 the real solution is evidently y = 1, the first term of the power series for 
y. Substitute y = 1+ p into (12), and discard the nonlinear terms in p and x in the 
resulting equation (in p and x) to obtain 


p= —4x. 


Next substitute p = — ix +q into the preceding equation in p and x, and discard 
all terms of degree higher than 2 to obtain q = x”/64. Thus the first three terms of 
the power series for y are 


Newton’s idea (not exploited extensively in the De Analysi) is to calcu- 
late the area (Fig. 6) under a curve f(x, y)=0 by “resolving the affected 
equation”’ for y as a power series in x, and then integrating this power 
series termwise (applying his first two rules). 


204 The Calculus According to Newton 


The Reversion of Series 


In the De Analysi Newton applies his method of successive approximations 
mainly to the “reversion of series”. For example, given his series 
zex—gx?tGxr—Gxttixr—- 


for the area z under the hyperbola y = 1/(1 + x), he wants to solve for x in 
terms of z (Fig. 7). 

He decides to solve only for the first five terms in the series for x, and 
therefore drops all terms of degree higher than five, obtaining 


1 5 1 4 V3 LZ a 
5x — 4x +3x — 7X +x-—z=0. (13) 


Deletion of all nonlinear terms gives the first approximation 
x =z. 


Substitution of x = z + p into (13) yields 
(—4274+323-124+225) + p(l—-z+z?-z3+2z4) 
+p*(—442-3274+223)+--- =0. (14) 


Neglecting nonlinear terms in p we obtain 


$z*—-32° ++ 42 4_—i2° l 
ps rt Groat. Hei, 
l—-z+z*—z3+24 2 
so our second approximation is 


x =z+52 


Substitution of p =4z?+ q into (14) yields 


(-223 +22 4_ 47°) + g(l—z+3z?) + --- =0Q 
SO 
t23—zit+552° P 
q io ae 
l= 2F5Z 6 


Figure 7 
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Thus the third approximation for x is 
x mzt+5z7+i2°, 
Continuing in this fashion Newton derives 


x=eztizr?+izr+tzt+arrt+-:--. (15) 


Since z=log (1+ x) is equivalent to x=e*—1, Newton has in (15) 
derived the exponential series 


e?=1l+z4+52z7+i27?4+.--- (16) 


ior the first time. 
In brief, the above method for the reversion of the series 
z= ax + Ayx? + SONS ay 
to solve for 
x= b,z + bz? + oe hy 


is as follows. Having found b,(=1/a,), b,,..., 6 


,-p substitute 
x=bzt--- +b _ yz" +r 
into the original series and collect terms in the form 
(Az"+ Bz"*t!+ ---)+r(A'+ B’zt---)+--- =0 

Then 

Az"+ Bz"t!+... A ., 
Se = =e + Srna att 

A'+ B’z+:-:- A 

so we take b, = —A/A’. 


Discovery of the Sine and Cosine Series 


Newton proceeds to apply these techniques to obtain the power series for 
sin x and cos x for the first time. He derives first the series 


sin-'x = x+ex°+ox°+3x74+--- (17) 
as follows. Consider the circle x?+y?=1, as in Fig. 8. Then the angle 
6=sin ~"x (measured in radians) is twice the area of the circular sector 
OQR. But Newton knew, as a result of having integrated V1 — x? 
termwise after binomial expansion (see Equation (8)), the area of the 
segment OPQR to be 


sO GS sO ets NT Se a 
MIG X Ha Sak , 
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Figure 8 


It therefore followed that 
9 = 2(x-2x?~Zx5- ext: 


) 
= 2(x—2x —~ 49 x — 5x) + a ae ) 
) 


O@=xt+ixr+ 3x45 x7 +- 


as desired (where V 1 — x? has been expanded by the binomial series). 
Newton then obtained his series for the sine by reversion of series (17)! 


EXERCISE 10. A>ply Newton’s method of successive approximations to invert the 
above series for sin~'x and obtain 


x=sind = 0—-10°+3,0°-Z,0'+---. (18) 
EXERCISE 11. Calculate the square root to obtain the series for the cosine, 


cos 9 =V1—sin@ = 1-—107+104-+2.0°+.--.-. (19) 


This can be done by either (i) direct algebraic root extraction, (ii) application of the 
binomial series, or (iii) by application of Newton’s method of successive approxi- 
mations to solve the equation 


we’ 
N 
ll 
—_ 


y+ (9—30°+ 759° — 


for y = cos 8. 


Having carried out these computations, Newton finally noticed the 
“obvious” factorial pattern of the coefficients, 


(—1)*6**" 1)*92*! (-1)*0* 1)“9* 
= 
sin 9 = pier ssre GQkapr’ pte (OK)! 


In the De Analysi, Newton actually obtained (18) by inverting a series 
for the arclength OR. The above computation, based on area rather than 
arclength, is taken from his original investigation of 1665—-see ({NP I], 
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p. 110) where he exuberantly writes out the series for sin~'x through the 
term in x7), 

Newton next applied his trigonometric series to calculate the areas under 
the cycloid and the quadratrix ({NP II], pp. 239-241). He takes the cycloid 
ADG (Fig. 9) defined with respect to the circle of diameter AH =1 by the 
relation 


oe 
BD = BK + AK, 


that is, DK is equal to the length of the circular arc AK. If AB= x, then 


BK =Vx-x? 


1 1 
= x1/2_143/2_15/2_ 1 y 7/2... 


by the binomial formula, and the circular arc from A to K is 


_, AK 


™ l : 
AK = 59 = sin aH 


= sin~!Vx 


ee ee ee a ey ae Ee 
x + Ex + 3x + T5* 


by the above series for the arcsine. Consequently the Cartesian equation of 
the cycloid is 


y =Vx—x? +sin7'Vx 


= VP 2h Bf 2, nN OJ 2 a WI 2s 
2x am sae sex 


By termwise integration the area ABD under the cycloid is therefore 


4.3/2 _ 2 


Bg Of 2 V2 NO /2. 2. 
3% 1s * x 252 * 


70 


Figure 9 
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A x V x 
Figure 10 


The quadratrix, a “mechanical” curve that was employed by the Greeks 
to square the circle and trisect an angle, is defined as follows (Fig. 10). 
Suppose the point B starts at the point E(0, 7/2) on the y-axis and moves 
to the origin A with constant speed during one unit of time, so its height at 
time ¢ 1s 


y =F (1-9. 


At the same time suppose the point K starts at 6(0, 1) and moves down the 
unit quarter-circle bKV to V with constant angular speed in one unit of 
time, so its angle at time ¢ is 


9 = 7-2). 


Then the typical point D on the quadratix is the intersection of the 
horizontal line through B and the radial ray through K. From the two 
expressions above we have 


or 
x =yocoty 


as the Cartesian equation of the quadratrix. 
Now Newton has x and y interchanged (Fig. 11), so his equation is 
y =x cot x. Substituting the series for the sine and cosine, he obtains 


cos x _ 1a (1/2)x* + (1/24) x" — (1/720) x0 + + 


 “sinx  ~ x —(1/6)x2 + (1/120)x> — (1/5040)x? - « - 
_ 1 (1/2)x? + (1/24) x4 — (1/720) x8 - = - 
1 — (1/6)x? + (1/120)x* — (1/5040)x°®- - - 
PPS Hak ae XS 
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— 
_ | 


Figure 11 


by division, so termwise-integration gives 


3 5 a 


1 
— 555% rrr, s 


1 
X —~ 5x 6615 


for the area ABDV. 


Methods of Series and Fluxions 


In 1671 Newton collected and organized his calculus investigations of the 
preceding half dozen years in the comprehensive treatise that is his 
mathematical chef-d’oeuvre, De Methodis Serierum et Fluxionum (Of the 
Methods of Series and Fluxions) ({(NP III], pp. 32-353). His first attempts 
to publish this major work were unsuccessful, and then abandoned, but it 
was used throughout his life as a primary source of his early results, as (for 
example) in writing his two famous letters to Leibniz in 1676. It was loaned 
on occasion to interested parties, but did not appear in print until 1736 
(after his death). 

The first part of the De Methodis is an augmented version of the De 
Analysi, and includes an elaborate discussion of infinite series techniques 
for the solution of both algebraic and differential equations (the method of 
undetermined coefficients). This is followed by a richly detailed compila- 
tion, under the heading of twelve formally stated problems, of applications 
of Newton’s series and fluxional methods. 

For example, under “Problem 3—To Determine Maxima and Minima” 
((NP III], pp. 117-121), he gives the following directions. 


When a quantity is greatest or least, at that moment its flow neither 
increases nor decreases: for if it increases, that proves that it was less and 
will at once be greater than it now is, and conversely so if it decreases. 
Therefore seek its fluxion [by previously described methods] and set it 
equal to nothing. 


That is, Newton says to find the points at which f(x) may attain its 
maximum or minimum values by solving the equation f’(x)=0. He in- 
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cludes a list of nine geometrical problems that can be solved by this 
technique that is now a staple of introductory calculus courses. 

In the De Methodis Newton’s conception of fluxions has advanced from 
his original explanation in terms of velocities of moving bodies to the 
mature state that he many years later (see [NP III], p. 17 for reference) 
described as follows. 


I consider time as flowing or increasing by continual flux & other 
quantities as increasing continually in time & from y* fluxion of time I 
give the name of fluxions to the velocitys w" w all other quantities 
increase. .. . I expose time by any quantity flowing uniformly & represent 
its fluxion by an unit, & the fluxions of other quantities I represent by any 
other fit symbols ... This Method is derived immediately from Nature 
her self. 


It is important to note that not only the time ¢ but “any quantity flowing 
uniformly” can be chosen as the independent variable x. With x = 1, the 
fluxion of any other variable is then its derivative with respect to x. 


Applications of Integration by Substitution 


Under Problem 8 ({[NP III], pp. 119-209) Newton introduces his formal 
technique of integration by substitution. Let v = f(x) and y = g(z) describe 
the curves FDH and GEI, respectively (Fig. 12). Then, he says, “imagine 
their ordinates DB = v and EC = y to advance erect upon the bases AB = x 
and AC =z: The increments, and so the fluxions, of the areas s and ¢ thus 
traversed will then be as those ordinates multiplied into their speeds of 
advance, that is, into the fluxions of the bases.” Therefore 


ae, (20) 


If we take x= 1 so 5 =v, it follows that y= t/z. If we further assume that 
s=t, then s=t=v, so 


yes. (21) 


A x B A Z Co 
Figure 12 
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If, finally, z is given as a function of x, z=¢(x) or x= y(z), then (21), 
with the right-hand side expressed as a function of z, defines the function 
y = g(z) for which the two areas are equal. In particular 


Z (x) $(H(z)) 
y = f(Y(z))v'(2), 


so Newton is simply saying that 


i) f (x)dx 
transforms to 
[fGOwoa« 


under the substitution x = ¥(z). 
As a first example, he takes v= Vax — x? and z= Vax (v?=ax — x? 
and z*= ax). Then 2zz =a, so (21) gives 


2Z 
ae — x2 
a 
22° 5 5 
ye Ne 
a 


as the equation of the second curve, so 
Vie 22* ft} 
f ax—-X dx = {= a“ —z* dz. 
a 


Newton concludes the section with elegant quadratures of the cissoid, 
cycloid, and Archimedean spiral. These quadratures utilize a powerful 
technique for the transformation of integrals that generalizes the simple 
method of substitution discussed above. 

Instead of assuming that the areas s and ¢ under the curves v = f(x) and 
y = g9(z), respectively, are equal, we assume a more general relation of the 
form 


t= F(x, v) + ks (kK =constant). (22) 


Upon substitution of 


™. 
| 


= xF. + oF, + ks 
F. + oF, + ko 


into 
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the substitution x =y(z) can be used to express y as a function of z, 
thereby obtaining y = g(z) such that (22) holds. In particular 


— Fv), FWY) +f WE) FAH), FOL) + F(Z) 
. ; : 
¢'(¥(z)) 
It may be noted that this transformation incorporates the integration by 


parts technique as well as substitution. For if we take v = f(x) = u(x)w’(x) 
and 


t = u(x)w(x) — s, (23) 
then 
t = u(x)w'(x) + u'(x)w(x) — u(x)w (x) 
= u'(x)w(x), 
SO 
y = 5 = uz) w(V2)W'(2) = a(2). 
Since 


s=[f(x)dx, t= [e(z) a, 
(23) therefore gives 
[ e@) wz) dz = ulx)w(x) = f u(xe)w'(x) de 
or 


fw du = ww — fudw, 


the familiar integration by parts formula. 
For example, for his quadrature of the cissoid v = x? / Vax — x? , New- 


ton takes z= Vax — x” and 
t= 5xVax—x? +s. (24) 


He then computes t= ax /2V ax — x? , so y=t/z= Va? — z”, the equa- 
tion of a circle. Therefore (24) reduces the quadrature of a segment of the 
cissoid to the quadrature of a segment of the circle. 


EXERCISE 12. Verify that = ax/2V ax — x* in the above computation. 


Newton’s Integral Tables 


Newton’s 1671 treatise contains two tables of integrals (INP III], pp. 237- 
255). The first is entitled ““A catalog of some curves related to rectilinear 
figures.” It consists of a list of curves y = f(z) such that the corresponding 
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area t= F(z) can be calculated explicitly by means of direct or inverse 
differentiation. 

The second table, entitled “A catalogue of some curves related to conic 
sections,” first appeared in print in an appendix (“De Quadratura 
Curvarum’’) to the 1704 Opticks. It consists of a list of curves y = f(x) such 
that the corresponding area can be expressed in terms of the area under an 
appropriate conic section, i.e. in terms of an integral such as 


f a or [ Vat bx+ cx? dx. 


at bx 


These reductions are obtained by means of Newton’s technique of compar- 
ing the area ¢ under the curve y= f(z) with the area s under the curve 
v = g(x), given a “substitution” x = y(z). 

For example, given y = z"~'/(e + fz”), he substitutes x = z” and consid- 
ers the hyperbola 


1 


= et fx 

Taking z = 1, it follows that 

ee 

aie e+ fz” 
and 
S=vx = “a A 

etfx etfx’ 

where 


t= fy de and s = fv dx. 


Hence t= $/7 so 


or 


z™'@ 1 dx 
ara ae bea a 


(taking lower limits so that the constant of integration is zero). 
Given (25), Newton calculates the integrals 


z— lg 
u~loape (kD 
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by recursion, as follows. Taking z = 1, 


i = tid 
kK et fz" 
k—1)n-1 
ol eset 6 2 
J -@4 Jz" 
1 k-n-1 _ 2 | 
= ra . ~ pie 
SO 
zk-Dn og 
t, = ———— - = 
© (kVp FS 


For k =2 and k =3, this yields 


{oF z” e dx 
e+fz"” fy Ins et+fx 


and 


a | a 
faa ee = ae 


where x = z" as before. 

The above integrals constitute the “first order” (or subsection) of New- 
ton’s table. The third of the ten “orders” in the table includes integrals of 
the form 


kn-1 
f ae Si |) 
e+ fz"+ 9277" 
For example, 
n-1 
fF = BF fom, (26) 

e+ fz + 977" 7» 7 

where 
2_ 2 
ae, eee and p2 = 48 +(f* 408) x* | 
e+ fz" + gz" 42? 


The verification of (26) by means of this substitution turns out to be 
somewhat tedious; the following exercise gives a simpler approach to the 
same integral. 


EXERCISE 13. Provide the details for Newton’s alternative evaluation of 


t= f z"~'dz 
e+ fz +g27"" 
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as follows. First substitute x =z” to obtain 


pe {— 
NJ e+fx+gx? 
Then use the factorization 


4g(e+ fx+ gx’) = (ft+p+2gx)(f—p+2gx), 


where p” = f* — 4eg, to conclude that 


t= 28 Cees Sea See Jae. 
pn) \ f-pt2gx f+pt2gx 

Newton includes a number of area computations illustrating the applica- 
tions of his tables of integrals. The first such example is the quadrature of 
the versiera (later called “Agnesi’s witch”) that he describes as follows. Let 
AHQ be a semicircle of diameter 1 on the y-axis as pictured (Fig. 13). 
Given a point C on the horizontal z-axis, let E be the point on C/ at the 
same height as the intersection H of the semicircle 4HQ and the diagonal 
Al of the rectangle ACIJQ. Then E 1s the typical point on the versiera. It is 
easy to calculate 


l 
oe 1+2z2° 


so Newton needs to compute the integral 


z od 
i | eC 


which of course equals tan™ 'z. 


Consulting one of the entries in his tables, he notes that the area s under 
the circle v= V1 — x? transforms to the above area ¢ under the transfor- 
mation 


z=—VI-x = (x) or ee ee, 


1+27 
(27) 


Figure 13 
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This may be verified as follows. 


t=o+xb—25 = xb-v (s=v) 
2 


—x W132 1 
= ——— - Vl-x* = —- ———_ 
V1— x’? 1 — x? 


t= = 1+2z?, 
Zz 
and 
eg ee oe 
V1—x? x? 
= ~(2+2)Vi+2 . 
sO 
eee Le 
y Z z+(1/z) 14+22— 


The limits z=0 and z=z correspond to x=1 and x=1/V1+ 2’, so 
(27) gives 


if Z aN xP aD [SEE Na a, 
o 1+2? 1 
xV1-x2 +2{- V1—x? dx. (28) 


1/V1+2z? 
Newton expresses this result geometrically as follows. The proportion 
AH _ CE 
AI CI 


gives AH =1/V 1+ 2? =x, so (28) simply says that the area ¢ of ACEQ 
(Fig. 13) is twice the area of the circular sector ABD in Figure 14. Since 
tan a =z, this means that 


z dz ; 
f — = tan 'z as desired. 
o l+z 


EXERCISE 14. Newton’s next example is the quadrature of the “kappa curve” which 
he describes as the locus of the corner point of a right-angled rule AEF with 
unbounded leg AE passing through the origin A, as F moves along the horizontal 
y-axis, with the finite leg EF being of unit length. Then the similar triangles AEH 
and EFH (Fig. 15) give 


so the area AGEC is 


pp 
0 V1-—2? 


Show by the transformation method of the previous section that, if s is the area 
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Figure 15 


under the circle v= V1 — x? , and x =z, then 


f= §— xv. 


That is, 


Arclength Computations 


Under Problem 12: To Determine the Lengths of Curves ({NP III], 
pp. 315-329) of “Methods of Series and Fluxions,” Newton applies the 
basic fluxional technique for the computation of arclength that he de- 
scribes as follows (see Fig. 16). 
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M 5 N on 
Figure 16 


The fluxion of the length is determined by setting it equal to the square 
root of the sum of the squares of the fluxions of the base and the 
perpendicular ordinate. For let RN be the line perpendicularly ordinate 
upon the base MN, and QR the proposed curve at which RN terminates. 
Calling MN=s, NR=t and QR=v, and their respective fluxions 5, ¢, 6, 
imagine the line NR to move forward to the next closest possible place nr 
and then, when the perpendicular Rs is let fall to nr, will Rs, sr and Rr be 
contemporaneous moments of the lines MN, NR and QR, by whose 
addition they come to be Mn, nr and Qr. Since these are to one another as 
the fluxions of the same lines and, because of the right angle Rsr, it is 


V Rs? + sr? = Rr, therefore Vs? + t? =6. 


That is, he deduces from the “characteristic triangle” Rsr the arclength 
relation that takes the familiar form 


4 (2) +(2) 
at at at 
in terms of rectangular coordinates x and y and arclength s. However, in 
the examples that follow, he denotes by z and » the horizontal.and vertical 
coordinates, and by ¢ the arclength. Setting z=1, he therefore has 
t=y1+ y? . For example, if y =(z?/a’)+(a’/12z), then t=[1+[(z7/a’) 
~— (a? /12z?)P]!/? = (327/ a?) + (a*/12z7), so t=(z?/a*)—(a?/12z). New- 
ton points out that, if A4b= 5a (Fig. 17), substitution of z= 5a gives 
= —a/24, so the length of dD (where AB = z) is 
or oe oe ~2  @ a 
q2 122 24 a2 12z 24 
This is an example of “choosing the constant of integration.” 
In order to rectify the semicubical parabola z? = ay”, Newton calculates 


f=yl+y? = 1+ , 
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whence 


from his table of integrals. 


EXERCISE 15. In order to rectify the curve z4= ay?, Newton first calculates 


1622/3 
ez o> 
t= 1+ 1+ 9q2/3 ; 
that is, 
2/3 
t ee 16z 
4 9q2/3 


Now show that the area s under the hyperbola 


16x? 


= \/x+ 
o x 92/3 


and the area ¢ under the curve 


Consequently the length (from z=0 to z=z) of the curve z4=ay® is equal to a 
hyperbolic area, 


z 1627/3 22/3 16x? 
t= f I+ re dz =f Ng 378 dx. 


In his Example 5 Newton gives the first rectification of the “cissoid of 
the ancients,” described in rectangular coordinates by 


(a—z) 


~ Veta=2) 
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yu — 4z77/*(a _ z)'(a +2z) 


t =yi+y? = taz~3?Va +3z : 


In order to find ¢, he refers to an entry in his catalogue of integrals for the 
fact that the area s under the hyperbola v = V a? + 3x? corresponds to the 
area ¢ under the curve 


y =1@2—32Va43z 


under the transformation 


a 
= — a*+3x? ; 
xX 


Upon substitution of v = V a? + 3x? and simplification, it follows that 


= 52 Vart3z ; 


as desired, since x? = az. 
Thus Newton has expressed the length 


t= [ 4ae3?Va432 dz 


of the cissoid over the interval [z, a] in terms of the area s (Fig. 18) under 
the hyperbola v = V a? + 3x? over the interval [x, a], 


2 2\3/2 ja 
a /Vaz 


ax 


Vaz 


In his Example 8 Newton derives an infinite series for the arclength of 
the ellipse 


y? + bz? = a’. 
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Differentiation gives 


bz 
V a? + bz? 
SO 
; 2+ (b+ b)z? 
feyiey? =\/Soe = 
4 a* + bz?” 
= pp tg Oe a Oey ee - 
a’ a‘ a® 


2 3. 24 4 ALS 4-36 
| apace 24 Ae HO ag BO AO TO 
2a’ 8a‘ 16a°® 


by division and root extraction. Termwise antidifferentiation then gives 


7 re 


2 3_ 24 4_ ans a p6 
meee eee os 8b" — 4b" +5" 
6a? 40a* 112a® 
for the length of the arc of the ellipse over the interval [0, z]. 
In his final arclength example in the De Methodis, Newton uses his series 


xe x? x? 


sin x = x—~ ar tor ay 


(from the De Analysi) to rectify the quadratrix 


— sin? 
y = xcot + = xyl sin*(x /a) 


foie 


sin(x /a) 
By root extraction and division of series he obtains 
x 2x° 
yeaa 


3a 45a? (945a5— 
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Ta 
pa 
Figure 19 
SO 
ye 28 Ax aS 
3a 45a? = 315a° 
Then 
f =y1+,? 
9 4 05 gt 127,575 q 

SO 


pax eee % yl 604 x! 

27 q? 2025 g* 893,025 gé 

is an infinite series expansion for the arc of the quadratrix above the 
interval [0, x] (Fig. 19). 


EXERCISE 16. Set a@=1 and carry out the above series computations to verify 
Newton’s result. 


The Newton—Leibniz Correspondence 


As we saw in Chapter 7, Newton announced his binomial series and 
described its discovery in the two letters of 1676 that he sent to Henry 
Oldenburg, secretary of the Royal Society of London, for transmission to 
Leibniz. These famous letters, the epistola prior dated June 13, 1676 (INC 
II], pp. 32-41) and the epistola posterior dated October 24, 1676 ({NC II], 
pp. 130-149), served to establish Newton’s priority for many of his early 
calculus applications that we have recounted from the De Analysi and the 
De Methodis. While concealing his basic fluxional techniques by means of 
anagrams—an unfortunate device that was not altogether unknown in the 
seventeenth century—he listed in these letters a large number of the 
infinite series, quadrature, and rectification results whose derivations we 
have described in this chapter. 
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An additional item of interest in this correspondence concerns quadra- 
tures of the circle that result in infinite series representations of the number 
a. In his reply to the epistola prior, Leibniz presented the alternating series 

T 1 1 1 

Gn gee are (29) 
that now bears his name ({NC II], pp. 65-71). Contrary to common 
assumption, Leibniz did not obtain this result directly from the integral 


by termwise integration after expansion of the integrand as a geometric 
series. As will be explained in Chapter 9, Leibniz’ derivation of (29) was an 
application of his general “transmutation” method for the transformation 
of integrals. 

In reply in the epistola posterior, Newton mentions his own table of 
integrals for “comparison of curves with conic sections,” including in 
particular the integral of 


z7! 


e+ fz"+ 9277 


(see Exercise 13). He first remarks that with f=0O and 7=1 this gives 
Leibniz’ series (29) (evidently by the inverse tangent approach mentioned 
above), and then presents the series 


1+3-s-st+5t+n-g-ut: (30) 


“for the length of the quadrantal arc of which the chord is unity (i.e. for 
a/2V2 ), or, what is the same thing, this: 

beh-dtbo 
for the half of its length. And these perhaps, since they are just as simple as 
the others and converge more rapidly, you and your friends will not 
disdain. But I for my part regard the matter otherwise. For that is better 


which is more useful and solves the problem with less labor” ((NC II], 
138-139). 


EXERCISE 17. Provide, as follows, the details in Newton’s likely derivation of the 
series 


a  1,4(-)1"*'_ 1,1 
ye ae ee ee a 
(a) Show that 
pi+x? | 1 1 1,1, 1 
joa ee es 5 7tOotTT 


by expanding 1 /(1 + x*) as a geometric series and integrating termwise. 
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(b) Noting that 


1+ x? SE: Se One ea 
l+x* 14V2x¢x2 1-V2x4¢x2° 


conclude that 


ees lr dx 

x= = SS nS 
o 1+x4 2S-114V2x+x? 
Substitute x +(V2 /2)=(1/V2 ) tan @ to obtain 


as desired. 
(c) Finally group terms as 


1+ (3-3) - G8) +A) 


and find common denominators. 


The Calculus and the Principia Mathematica 


The magisterial Philosophiae Naturalis Principia Mathematica (Mathemati- 
cal Principles of Natural Philosophy) was published in 1687. This founding 
document of modern exact science sets forth in comprehensive detail 
Newton’s system of mechanics and theory of gravitation. 

The Principia bristles with infinitesimal considerations and limit argu- 
ments, and is therefore sometimes regarded as Newton’s first published 
account of the calculus. However, its exposition is couched almost entirely 
in the language and form of classical synthetic geometry, and makes little 
or no significant use of the algorithmic computational machinery of 
Newton’s calculus of fluxions. The traditional view is that Newton first 
discovered the basic propositions of the Principia by means of fluxional 
analyses and computations, and later clothed them in the accepted dress of 
synthetic geometry, presumably in an effort to avoid controversy (“to 
avoid being baited by little smatterers in Mathematicks”). However, 
according to Whiteside, “it is futile to plough laboriously through the 
voluminous mass of Newton’s extant papers (containing 10-15 million 
words at a conservative estimate) in search of manuscripts bearing dotted 
fluxional arguments which reappear, suitably recast in geometrical mould, 
in the pages of the first edition of the Principia. ... Nowhere, let me 
repeat, are there to be found extant autograph manuscripts of Newton’s, 
preceding the Principia in time, which ‘could conceivably buttress the 
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Figure 20 


conjecture that he first worked the proofs in that book by fluxions before 
remoulding them in traditional geometrical form. Nor in all the many 
thousands of such sheets relating to the composition and revision of the 
Principia is there the faint trace of a suggestion that such papers ever 
existed” ((7], pp. 9-10). 

What Newton did, in fact, need and use throughout the Principia was a 
facility for dealing with limits of ratios of geometrical quantities. For 
example, Lemma VII in Section I of Book I states that, given a chord AB 
of the arc AB of a curve, and a corresponding tangent segment AD 
(Fig. 20), “if the points A and B approach one another and meet,” then 
“the ultimate ratio of the arc, chord, and tangent, any one to any other, is 
the ratio of equality.”” The Scholium to Section I specifies that 


By the ultimate ratio of evanescent quantities (i.e., ones that are approach- 
ing zero) is to be understood the ratio of the quantities not before they 
vanish, nor afterwards, but with which they vanish.... Those ultimate 
ratios with which quantities vanish are not truly the ratios of ultimate 
quantities, but limits towards which the ratios of quantities decreasing 
without limit do always converge; and to which they approach nearer 
than by any given difference, but never go beyond, nor in effect attain to, 
till the quantities are diminished in infinitum. 


Thus the “ultimate ratio of evanescent quantities” is simply the limit of 
their ratio. Lemma I of Book I of the Principia is, in effect, Newton’s 
attempted definition of the limit concept: “Quantities, and the ratios of 
quantities, which in any finite time converge continually to equality, and 
before the end of that time approach nearer to each other than by any 
given difference, become ultimately equal.” In modern notation, we would 
say that if, given « > 0, it follows that f(2) and g(2) differ by less than e for ¢ 
sufficiently close to a, then 


lim f(t) = lim g(¢). 


Although Newton did not make explicit and systematic use of the fluxional 
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calculus in the Principia, he provided in these passages his clearest exposi- 
tion of the limit concept on which that calculus is based. 

Newton’s apparent use of infinitesimals or indivisibles in subsequent 
portions of the Principia has frequently been a source of confusion or 
misunderstanding. However, he warned in the Scholium to Section I of 
Book I that, “if hereafter I should happen to consider quantities as made 
up of particles, or should use little curved lines for right (straight) ones, I 
would not be understood to mean indivisibles but evanescent divisible 
quantities; not the sums and ratios of determinate parts, but always the 
limits of sums and ratios; and that the force of such demonstrations always 
depends on the method laid down in the foregoing Lemmas.” In the 
introduction to the De Quadratura (discussed in the next section) he 
similarly stressed that 


By like ways of arguing, and by the method of Prime and Ultimate 
Ratios, may be gathered the Fluxions of Lines, whether Right or Crooked 
in all cases whatever, as also the Fluxions of Surfaces, Angles and other 
Quantities. In Finite Quantities so to frame a Calculus, and thus to 
investigate the Prime and Ultimate Ratios of Nascent or Evanescent 
Finite Quantities, is agreeable to the Geometry of the Ancients; and I was 
willing to shew, that in the Method of Fluxions there’s no need of 
introducing Figures infinitely small into Geometry. For this Analysis may 
be performed in any Figures whatsoever, whether finite or infinitely small, 
so they are but imagined to be similar to the Evanescent Figures; as also 
in Figures which may be reckoned as infinitely small, if you do but 
proceed cautiously. 


In other words, Newton says, exposition in terms of indivisibles or infini- 
tesimals is simply a convenient shorthand (but not a substitute) for 
rigorous mathematical proof in terms of ultimate ratios (limits). 


Newton’s Final Work on the Calculus 


Of Newton’s several treatises on the calculus, the last written but first 
published was the De Quadratura Curvarum (On the Quadrature of 
Curves). This severely technical exposition of Newton’s mature calculus of 
fluxions was written in 1691-1693 (rather than 1676 as stated in most 
histories of mathematics) and appeared as a mathematical appendix to the 
1704 edition of his Opticks. 

In the epistola posterior Newton had stated without proof the following 
“prime theorem” concerning the squaring of curves. The area under the 
curve 


y = x"er+fx71y 
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1S 


x" r-1 eA r-2 eB r—-3 eC 
(pts et} eN 
where 
n\At 1 
g-e. ce, s=Actyr, a = 7(r—1), 
and the letters A, B, C,..., denote the immediately preceding terms, that 
1S, 
x r—l1 eA r—2 eB 
CE Sag? Gee 


In equivalent summation notation, 


0 NA eee a aa = z(r—1)\(r-2)--- (r—k) ek 
fx (et jfxry a S I+ 2 ¢ I) (s—1)(s—2)- +--+ (s—k) fky™ 


(31’) 
If r is a positive integer, then this is a finite sum with r terms; otherwise it 
is an infinite series whose convergence requires discussion (which Newton 
does not provide). 
For example, if y=x"=x%(0+.x!)", then 0=0, f=n=1, A=n and 
Q=x"t! -=1, s=n+1, 7=0, so (31) reduces to a single term, 


n+1 


[xt ax = re 


If y=x/(1—2x?2+ x)= x(1—x?)~?, then 0=1, f= —1, n=2, A= —2 
and Q= — $(1— x’) "', r=1, s=—1, 7=0, so (31) gives 


f x dx z= 1 
1—2x?+x4 2(1— x?) 


Alternatively, if we write y = x ~°(—1+x7~)~?, then 0= —3, f=1,7=—2 
= and Q=— 3(—-1+x~*)7', r=1, s=—1, 7=0, so (31) gives 


f x dx = x? 
1—2x?+x4 2(1—x?)_ 


The two antiderivatives, obtained by application of (31) to different 
expressions for the same integrand function, differ by a “constant of 
integration.” 

The following two exercises correspond to additional examples that 
Newton gives in the epistola posterior. 
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EXERCISE 18. Write Vx + x? /x>=x~“(1+.x7~!)'/? and apply (31) to obtain 


= 2 = 
[AVE e ae = SIE Te VaFE | 
x 


105x4 


EXERCISE 19. Write 


x 1/3 


V 1 — 3x2/3 + 3x4/3— x? 


and apply (31) to obtain 


= x1/3(] = x2/3)— 3/9 


f x'/3 dx _ _ 30x7/3+75 (1 = x2/3)2/5. 
Y, 1 —3x2/3 + 3x473— x? ae 


The following exercise outlines a proof of the Newton’s “prime theorem”’ that is 
suggested by the methods of the De Quadratura (see [NP VII], p. 28, note (21)). 


EXERCISE 20. If 
= fx? et fury dx i=0,1,2,..., 


then (31’) gives the value of Jp. 
(a) Show that the derivative of x°~9"(e + fx7)t! is 


nle+fx7)[(r— dex? + (5 — i+ 1) fxe8~G- 99], 
Conclude by antidifferentiation that 


QOx@—)a 7 (r—i)e 
Sr say Goin (32) 


(b) Show by repeated application of the recursion formula (32) that 


a} e(r—1)(r—2)- ++ (r—k)_ e* 
=| ]+ > (-1) (s — 1)(s —2) - a (s—k) fxn 


n (r-—1)(7-2)- +--+ 7—n)e” 
OD SG=NG=D Gans 2 


This provides the remainder term that must be investigated in order to establish 
convergence of (31’) in case r is not a positive integer. 
(c) Similarly, obtain the following expansion in ascending powers of x”: 


= Ofer 1 (s+ 5 +2) see (stk) fkxe™ 
143 OE he ee 


(s+1)(s+2)--- (st+n)f" 


a ae r(r+1)(7+2)-° + (rtn—l)e” ~” ) 
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The De Quadratura contains the following generalization of (34)—see 
(NW I], p. 145 or [NP VII], p. 521). Let 
R=etfut+ gx™+hxer+---, 
S=atbx™+ cx" + dx+---, 
r= 6/n, s=rta, t=stdA, o=t+t+A, 


Then 


[ORs dx = x®R* a/n, b/1— sfA x 
re (r+ le 


rm c/n—(s+1)fB—tgA xan 


(r+2)e 
i d/n—(s+2)fC —(t+1)gB-—vhA | 
(r+ 3)e 
(35) 
where each A, B, C,..., is the coefficient of the preceding power of x, 
that is, 
_ 4/7 _ b/n-sfA _ c/n—(s+1)fB— tg 
A=f0, pa Pin TA Ci. cin ts te ee 
re (r+ l)e (r+2)e 
EXERCISE 21. Write 
x?+x4—8xF — x®—9x44+8x3 


(x-1P3(x+2%  (x-1)*(x +29" 


= x3(2—3x+ x3) 7(8—9x + x3) 
and apply (35) to obtain 
x?+ x4— 8x3 x4 
[Sn ea aaa 
(x —1)°(x +2) XH 3x42 


EXERCISE 22. Apply (35) to obtain 
(22S eo - 3x(1 + x?) 


(l+x—x?— x3)? (+x—x2— x3)! 


EXERCISE 23. Show that the infinite series corresponding to (34) is a special case of 
(35). 


As Hadamard has remarked, the De Quadratura “brings the integration 
of rational functions to a state hardly inferior to what it is now” ({2], p. 41). 
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In 1676 Newton himself had written, in a letter to John Collins (INC IT], 
p. 179), 


There is no curve line exprest by any equation of three terms—but I can 
in less then half a quarter of an hower tell whether it may be squared or 
what are ye simplest figures it may be compared wth, be those figures 
Conic sections or others. And then by a direct & short way (I dare say ye 
shortest ye nature of ye thing admits of for a general one) I can compare 
them. And so if any two figures exprest by such equations be propounded 
I can by ye same rule compare them if they may be compared. This may 
seem a bold assertion because it’s hard to say a figure may or may not be 
squared or compared with another, but it’s plain to me by ye fountain I 
draw it from, though I will not undertake to prove it to others. 


And in the same dozen years from 1664 to 1676 he had discovered the law 
of universal gravitation, explained the color spectrum of the rainbow, 
invented and built reflecting telescopes, and devoted inordinate amounts 
of time to smoky chemical experiments! 
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The Calculus According to Leibniz 


Gottfried Wilhelm Leibniz (1646-1716) 


In the century of Kepler, Galileo, Descartes, Pascal, and Newton, the most 
versatile genius of all was Gottfried Wilhelm Leibniz. He was born at 
Leipzig, entered the university there at the age of fifteen, and received his 
bachelor’s degree at seventeen. He continued his studies in logic, philoso- 
phy and law, and at twenty completed a brilliant thesis on the historical 
approach to teaching law. When the University of Leipzig denied his 
application for a doctorate in law because of his youth, he transferred to 
the University of Altdorf in Nuremberg, and received his doctorate in 
philosophy there in 1667. 

Upon the completion of his academic work, Leibniz entered the political 
and governmental service of the Elector of Mainz. His serious study of 
mathematics did not begin until 1672 (at the age of twenty-six) when he 
was sent to Paris on a diplomatic mission. The following four years that he 
spent in Paris were Leibniz’ “prime age of invention” in mathematics 
(similar to Newton’s 1664-66 period). During his stay in Paris he con- 
ceived the principal features of his own version of the calculus, an 
approach that he elaborated during the balance of his life, and which 
during the eighteenth century was dominant over Newton’s approach. In 
1676 he returned to Germany, and served for the next forty years as 
librarian and councillor to the Elector of Hanover. Although his profes- 
sional career was devoted mainly to law and diplomacy, the breadth of his 
fundamental contributions—to diverse areas of mathematics, philosophy, 
and science—is probably not matched by the work of any subsequent 
scholar. 


231 
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In regard to the calculating machine that he built during the Paris years, 
Leibniz remarked, “It is unworthy of excellent men to lose hours like 
slaves in the labor of calculation which could safely be relegated to anyone 
else if machines were used.” A lifelong project was his search for a 
universal language or symbolic logic that would standardize and 
mechanize not only numerical computations but all processes of rational 
human thought, and would eliminate the mental labor of routine and 
repetitive steps. His goal was the creation of a system of notation and 
terminology that would codify and simplify the essential elements of 
logical reasoning so as to 


furnish us with an Ariadne’s thread, that is to say, with a certain sensible 
and palpable medium, which will guide the mind as do the lines drawn in 
geometry and the formulas for operations, which are laid down for the 
learner in arithmetic (quoted by Baron [1], p. 9). 


Such a universal “characteristic” or language, he hoped, would provide all 
educated people—not just the fortunate few—with the powers of clear and 
correct reasoning. 

Apparently the formulation of this far-reaching goal antedated Leibniz’ 
serious interest in or detailed knowledge of mathematics. But, as Hofmann 
remarks, “A man who places such thoughts into the forefront of his mind 
has mathematics in his blood even if he is still ignorant of its detail” ({7], 
p. 2). Indeed, it was precisely (and only) in mathematics that Leibniz fully 
accomplished his goal. His infinitesimal calculus is the supreme example, 
in all of science and mathematics, of a system of notation and terminology 
so perfectly mated with its subject as to faithfully mirror the basic logical 
operations and processes of that subject. It is hardly an exaggeration to say 
that the calculus of Leibniz brings within the range of an ordinary student 
problems that once required the ingenuity of an Archimedes or a Newton. 
Perhaps the best measure of its triumph is the fact that today we can 
scarcely discuss the results of Leibniz’ predecessors without restating them 
in his differential notation and terminology (as in our discussion of 
Newton’s work in Chapter 8). 

A few examples will indicate what Leibniz meant by symbolic notation 
as a “sensible and palpable medium, which will guide the mind” to correct 
conclusions. In the functional notation introduced much later by 
Lagrange, the chain rule says that, if h(x) = f(g(x)), then 


h'(x) = f’(g(x))g’(x). (1) 

Nothing about the notation in Formula (1) suggests why it is true, nor how 

to prove it. But in differential notation, with z= f(y) and y = g(x), For- 
mula (1) becomes 

—_ = — -: —. 2 

dx dy dx (2) 


This formula, by contrast, conspicuously suggests its own validity, by 
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Figure 1 


cancellation of the right-hand side differential dy’s as though they were 
real numbers. This symbolic cancellation of differentials also suggests a 
logical proof of the formula—by replacing the differentials dx, dy, dz by 
the finite increments Ax, Ay, Az and proceeding to the limit. 

The integral version of the chain rule is the formula for integration by 
substitution, 


JP e@))e'(x) dx = f f(u) du. (3) 


The symbolic substitution u = g(x), du= g’(x) dx makes Formula (3) seem 
inevitable, whatever its proof may be. This amounts to the invariance of 
the differential form f(u) du with respect to arbitrary changes of variable 
—one of Leibniz’ most important discoveries. 

Now consider a surface that is generated by revolving the curve y = f(x) 
around the x-axis. Thinking of an infinitesimal segment ds of the curve as 
the hypotenuse of the “characteristic triangle” with sides dx and dy (see 
Fig. 1), the Pythagorean theorem gives 


ds =y( dx) + (a) -yi+(2) ie 


When this segment ds is revolved around the x-axis in a circle of radius y, 
it generates an infinitesimal area 


2 
j= apa 1+($) ae. 


Adding up the infinitesimal areas, we obtain 


A = [aA = f2myi+(2) dx. (4) 


for the area of the surface. Thus we “discover” the correct Formula (4) by 
a quite routine and plausible manipulation of Leibniz’ symbols. By con- 
trast, its rigorous justification would require a detailed discussion and 
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definition of the concept of surface area, followed by a proof (perhaps in 
terms of Riemann sums) that Formula (4) agrees with this definition. 

These simple examples illustrate the principal features of the analytical 
or symbolic calculus of Leibniz—the central role of infinitesimal dif- 
ferences (differentials) and sums (integrals), and the inverse relationship 
between them; the characteristic triangle as a link between tangent (dif- 
ferential) and quadrature (integral) problems; the transformation of in- 
tegrals by means of substitutions—and the manner in which this calculus 
does, indeed, guide the mind in the formal derivation of correct results. 

In this chapter we outline the stages by which Leibniz gradually dis- 
covered and elaborated his calculus. The first crucial steps were taken 
during his Paris years, 1672—76, eight or ten years after Newton’s formative 
period. However, Leibniz’ first publication of the calculus was in 1684, 
twenty years prior to the publication of Newton’s De Quadratura in 1704. 
In the final section of the chapter we discuss briefly the chief differences 
between the Newtonian and Leibnizian approaches to the calculus, and the 
unfortunate priority dispute between their respective followers that took 
place in the early eighteenth century. 

In 1714, two years before his death, Leibniz composed the essay Historia 
et origo calculi differentialis (History and Origin of the Differential Calcu- 
lus), opening with the lines (in the English translation of this extract 
provided by Weil [12]) 


It is most useful that the true origins of memorable inventions be known, 
especially of those which were conceived not by accident but by an effort 
of meditation. The use of this is not merely that history may give 
everyone his due and others be spurred by the expectation of similar 
praise, but also that the art of discovery may be promoted and its method 
become known through brilliant examples. One of the noblest inventions 
of our time has been a new kind of mathematical analysis, known as the 
differential calculus; but while its substance has been adequately ex- 
plained, its source and original motivation have not been made public. It 
is.almost forty years now that its author invented it... . 


English translations of the complete Historia et origo, and of a number of 
letters and manuscripts supplying additional details, are available in the 
volume of J. M. Child [5], to which frequent reference will be made in this 
chapter. In addition, Struik’s source book [11] contains English transla- 
tions of three of Leibniz’ earliest published papers on the calculus. 


The Beginning—Sums and Differences 


In the Historia et origo and elsewhere, Leibniz always traced his inspiration 
for the calculus back to his early work with sequences of sums and 
differences of numbers. As a young student he had been interested in 
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simple number properties, and in 1666 had published an essay entitled De 
arte combinatoria (On the Art of Combinations) that dealt with elementary 
properties of combinations and permutations. 

Shortly after his arrival in Paris in 1672, he noticed an interesting fact 
about the sum of the differences of consecutive terms of a sequence of 
numbers. Given the sequence 


Aor Ay, Az, - - 5 Ay 
consider the sequence 
d,,d,,...,4d, 
of differences, d, = a,—a,_,. Then 
dj+d,+--+ +d, = (a,—@)+(a,-4,) +--+ +(4,-4,-4) 
= a, —~ Ao: (5) 


Thus the sum of the consecutive differences equals the difference of the first 
and last terms of the original sequence. 

As an example, he observed that the “difference sequence” of the 
sequence of squares, 


0,1,4,...,n? 
is the sequence of consecutive odd numbers, 
1, 3;.5;42242n—1, 


because i?—(i—1)*=2i-—1. It follows that the sum of the first n odd 
numbers is n?, 


1434+5+--- +(2n-1) = nr’. (6) 


ExerCISE |. (a) By adding 2+4+ --- +2n to both sides of (6), show that 
1+2+3+--:> +2n = 2B an+l). 


(b) By adding (2n + 1) to both sides of the result of (a), show that 


2n+ 1 
2 


1+2+3+-°: +2n+ (2n+1) = (2n +2). 
Note that (a) and (b) together yield the familiar result 
1+24+34+--> 47 = 5 (n+1) 


for all positive integers n. — 


EXERCISE 2. Apply (5) to the sequence of cubes 


0,1,8,...,” 
to obtain 


Rn n 
3> 72? -3> itn=n’. 


i=] i=] 
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Solve this equation, using the result of Exercise 1, for 


n 
DP = P+ Pt +++ +n? = F(ntIQn+ 1). 


i=] 


His result on sums of differences also suggested to Leibniz the possibility 
of summing an infinite series of numbers. Suppose the numbers 


Oe 1a | Serre 
are the differences of consecutive terms of the sequence 
Ooi eG as 


that is, b. = a,—a,,,. Then 


b,+b,+---> +b, = a, — 4,4}. 
If, in addition, lim, ,,, a, =0, then it follows that 


S 5, = a,. (7) 


n=] 
EXERCISE 3. Apply (7) with {a,}?° being the sequence 1, $, i,..., of reciprocals 
of the odd integers to show that 
] ] ] ] 
a AY a cr 


EXERCISE 4. Noting that the sequence of differences of the terms of the geometric 
progression 
ee ee cae 
iS 
(1—a), (l—a)a, (1—a)a’,..., (1—-a)a",..., 


apply Formula (7) to show that 


= ] 
> ar = 


<5 l-—a 
ifO<a<l. 


Not long after his arrival in Paris, Leibniz called on Christiaan Huygens 
(1629-1695), who was completing his comprehensive treatise Horologium 
oscillatorium (1673) on the theory of the pendulum clock, and was prob- 
ably the most renowned scientist on the continent. When Leibniz de- 
scribed his results on sums of differences, Huygens suggested that he try to 
find the sum of the series 


u i I i ae | yee 2 (8) 
1 3 6 = 10 n(n + 1)/2 
of reciprocals of the triangular numbers. This problem had risen somewhat 


earlier in a discussion with Hudde on computing probabilities for certain 
games of chance. 
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Huygen’s problem was an especially propitious one for Leibniz to 
consider at that time. From his earlier work he was familiar with the 
combinatorial or figurate numbers as they appear in Pascal’s “arithmetic 
triangle.” Let this triangle be written in the form 


l l l 1 1 
l 2 3 4 5 6 
] 3 6 10 15 21 
l 4 10 20 35 56 
l 5 


15 35 70 126 


The nth element of each row is the sum of the first 2 elements of the 
preceding row. Thus the nth triangular number (or figurate number of type: 
1), the sum of the first n integers, is the nth element of the third row. Since 
the nth figurate number of type k is the sum of the first n figurate numbers 
of type k — 1 (see Formula (18) and Exercise 12 of Chapter 4), the (k + 2)th 
row consists of the figurate numbers of type k. Hence the arithmetic 
triangle exhibits the triangular numbers as sums of integers, the pyramidal 
numbers as sums of triangular numbers, etc. Conversely, the triangular 
numbers are differences of consecutive pyramidal numbers, etc. 

Leibniz saw that questions of the sort asked by Huygens could be 
answered by starting with the sequence of reciprocals of the integers, 
instead of the integers themselves, and constructing subsequent rows by 
taking differences rather than sums. In this way he obtained the following 
array, which he called his “harmonic triangle’. 


“|= 


Pook o@ooagooawoo4 
1 2 3 4 5 6 
fi tf ot 4&4. 2 ot 
2 6 12 20 30 42 
ty gh A i eke 3 
3 12 30 60 105 

ee re oe 

4 20 60 140 

tod od, 

5 30 


- Ss 
A 


Thus each row of the harmonic triangle is the sequence of differences of 
consecutive terms of the preceding row. Therefore Formula (7) implies that 
the sum of the terms of each row is equal to the first element of the 
preceding row. In particular, 


pret gt gh Si, (9) 
stitsstet-* “2: (10) 
statatit °° = 3 (11) 
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The nth element of the second row of the harmonic triangle is 


ye ee ee 
n n+l n(n+1)’ 


which is the half of the reciprocal of the nth triangular number n(n + 1)/2. 
Hence multiplication of Equation (9) by 2 yields the sum asked for by 
Huygens, 


Similarly, multiplication of Equation (10) by 3 yields the sum of the 
reciprocals of the pyramidal numbers, 


1,1 1 1 _ 3 
Tra 1° 2 = 3: 


EXERCISE 5. Show that the mth element of the third row of the harmonic triangle is 
2/n(n+ 1)(n +2), which is one-third of the reciprocal of the nth pyramidal num- 
ber. 


EXERCISE 6. Show by induction on k that the nth element of the (k + 1)st row of the 
harmonic triangle is 

ge eRe eet ete 

n(n+1)---(nt+k) (k+1)F(n,k)’ 


where F(n, k) denotes the nth figurate number of type k (see Formula (18) of 
Chapter 4). Conclude that 


= 1 k+1 
>» F(n, k) -  k 


Pascal’s arithmetic triangle and Leibniz’ harmonic triangle enjoy a 
certain inverse relationship with respect to their manners of formation 
—involving sums in the former case and differences in the latter. In the 
arithmetic triangle each row consists of sums of the terms in the preceding 
row, and differences of terms in the following row. In the harmonic 
triangle, however, each row consists of differences of the terms in the 
preceding row. 

These considerations implanted in Leibniz’ mind a vivid conception that 
was to play a dominant role in his development of the calculus—the 
notion of an inverse relationship between the operation of taking dif- 
ferences and that of forming sums of the elements of a sequence. 


EXERCISE 7. Given a sequence {a,};° OF @), @),a3,..., define the difference 
sequence 
A{a,} = a,—@, @3—@, «++, G41 &® 
and the sum sequence 
n 
> {4,} = a, Qa, + a, siete ay 


i=] 
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Then show that 


AD {4n} = (4n+1} 


and 


> A{a,} = {@,41— 4}. 


The Characteristic Triangle 


At the time of his investigation of sum and difference sequences in late 
1672, Leibniz was still largely ignorant of the mathematical work that was 
then contemporary. In a 1680 letter to Tschirnhaus ({5], p. 215), he told of 
a memorable conversation with Huygens in early 1673 that led to his 
mathematical self-education. 


The prime occasion from which arose my discovery of the method of the 
Characteristic Triangle, and other things of the same sort, happened at a 
time when I had studied geometry for not more than six months. 
Huygens, as soon as he had published his book on the pendulum, gave me 
a copy of it; and at that time I was quite ignorant of Cartesian algebra 
and also of the method of indivisibles, indeed I did not know the correct 
definition of the center of gravity. For, when by chance I spoke of it to 
Huygens, I let him know that I thought that a straight line drawn through 
the center of gravity always cut a figure into two equal parts; since that 
clearly happened in the case of a square, or a circle, an ellipse, and other 
figures that have a center of magnitude, I imagined that it was the same 
for all other figures. Huygens laughed when he heard this, and told me 
that nothing was further from the truth. So I, excited by this stimulus, 
began to apply myself to the study of the more intricate geometry, 
although as a matter of fact I had not at that time really studied the 
Elements. But I found in practice that one could get on without a 
knowledge of the Elements, if only one was master of a few propositions. 
Huygens, who thought me a better geometer than I was, gave me to read 
the letters of Pascal, published under the name of Dettonville; and from 
these I gathered the method of indivisibles and centers of gravity, that is 
to say the well-known methods of Cavalieri and Guldinus. 


It was in his study of Pascal’s work that Leibniz found his famous 
“characteristic triangle.” In June 1658 Pascal had proposed a contest, with 
a closing date of 1 October 1658, for the solution of several problems 
concerning the cycloid—to find the area and centroid of an arbitrary 
segment of a cycloid, and to find the volumes and centroids of various 
solids of revolution obtained by revolving such a segment about either its 
base or its ordinate. In 1643 Roberval had shown that the area of the 
whole cycloid is three times that of its generating circle, and that the 
volume obtained by revolving the cycloid about its base is five-eighths that 
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of the circumscribed cylinder (for this work by Roberval see Struik’s 
source book ({11], pp. 232-238). 

Most of the leading mathematicians of the day followed this contest 
with interest, and a number of them submitted proposed solutions. After 
none of the submitted solutions had been judged fully acceptable, Pascal 
published his own work on the cycloid and related problems in the form of 
Lettres de A. Dettonville (the pseudonym of Amos Dettonville being an 
anagram on Louis, or Lovis, de Montalte, the pseudonym under which 
Pascal’s Lettres provinciales had appeared). 

In the Historia et origo Leibniz, referring to himself in the third person, 
described his decisive discovery of the characteristic triangle as follows ({5], 
p. 38). 


From one example given by Dettonville, a light suddenly burst upon him, 
which strange to say Pascal himself had not perceived in it. For when he 
proves the theorem of Archimedes for measuring the surface of a sphere 
or parts of it, he used a method in which the whole surface of the solid 
formed by a rotation round any axis can be reduced to an equivalent 
plane figure. From it our young friend made out for himself the following 
general theorem. Portions of a straight line normal to a curve, intercepted 
between the curve and an axis, when taken in order and applied at right 
angles to the axis give rise to a figure equivalent to the moment of the 
curve about the axis. 


The Pascal reference here is to the short “treatise on the sines of a 
quadrant of a circle’ which is part of the first Dettonville letter (see 
Struik’s source book ({11], pp. 239-241) for an English translation). In its 
Proposition 1 Pascal proved that “the sum of the sines [ordinates] of any 
arc of a quadrant [of a circle] is equal to the portion of the base between 
the extreme sines multiplied by the radius.” The use of the word sine for 
ordinate connotes the fact that, in the sum referred to, each ordinate is to 
be multiplied by a corresponding infinitesimal arc ds of the circle (rather 
than by an infinitesimal segment dx of the base). 

To prove the proposition, Pascal constructed the right triangle E,E,K 
with hypotenuse £,£, tangent to the circle at a typical point D, and then 
noted that the triangles E, E,K and ADI are similar (see Fig. 2). Therefore 


AD DI 


— = . = : = A - R ‘ 
EE 7 EK? © Di EE: = AD- EK = AD- RR, 


Thus, if y= DI, a= AD, As= E,E,, Ax = R,R,, then yAs = aAx. Regard- 


ing As and Ax as indivisibles and summing up, we see that Pascal’s result 
1S 


fyds = fads. (12) 


Because 2zy ds is the area of an infinitesimal zone on the hemisphere of 
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R, I R,C 


Figure 2 


radius a that is obtained by revolving the quarter-circle around the x-axis, 
it follows that the area of the hemisphere is 


A = [2ny ds = 2na {dx = 2na’. 


Thus (12) provides an infinitesimal derivation of the area formula A = 47a? 
for the sphere of radius a. 


EXERCISE 8. Apply (12) to the arc of the quarter-circle corresponding to a <@< B 
to conclude that 


[sin @ a0 = cos a — cos 8. 


Leibniz’ “burst of light” consisted of noticing the quite general applica- 
tion of Pascal’s infinitesimal triangle construction to an arbitrary curve, 
with the role of the radius of the circle played by the normal to the given 
curve. Thus, from the similarity of the triangles shown in Figure 3, it 


Figure 3 
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follows that 
ds _ ax 


: or yds=n dx. 


Summation of infinitesimals then gives 


fy ds = [ndx. (13) 


Since Leibniz did not invent his differential-integral notation until two 
years later in 1675, he had to express (13) in verbal form—the moment of 
the given curve about the x-axis is equal to the area under a second curve 
whose ordinate is the normal 7 to the given curve (see the Historia et 
origo ({5], pp. 38-41) for Leibniz’ presentation of (13) together with For- 
mulas (14) and (15) below). Multiplication of the moment by 27 gives the 
area A =f 2ay ds of the surface of revolution obtained by rotating the 
original curve around the x-axis. When Leibniz showed this result to 
Huygens the latter “confessed to him that by the help of this very theorem 
he had found the surface of parabolic conoids [paraboloids] and others of 
the same sort, stated without proof many years before” [in 1657]. 


EXERCISE 9. Consider the parabola y=Vx, 0<x<a. Knowing that DVx=1/2Vx, 
show that the normal to the parabola is n=}V4x + 1 . Hence apply Formula (13) 
to show that the area of the paraboloid obtained by revolving this parabola around 
the x-axis is 


A = [2my ds =f Vax dx = 2[(4a+1)°/?- 1]. 
0 


sis 
6 


At the same time, Leibniz saw how to apply the characteristic triangle 
method to rectification and quadrature problems. Given a curve whose 
arclength is sought, let ¢ denote the length of the tangent line intercepted 
between the x-axis and a vertical ordinate of (constant) length a. Then 


7 


Figure 4 
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from the similarity of the triangles in Figure 4 it follows that 


Hence 
f a ds = i t dy, (14) 


so the rectification of the given curve reduces to a quadrature prob- 
lem—the calculation of the area of the region between the y-axis and a 
second curve whose abscissa x is the tangent ¢ to the given curve. 


EXERCISE 10. Consider the semi-cubical parabola y = x?/7, 0 <x < 8. Knowing that 
Dx?/? =2x—1/3/3, show that the tangent (taking a=1) is r=4V9y + 4. Hence 
apply Formula (14) to show that the arclength of this semi-cubical parabola is 


= [*1Voy+4 dy = 4[(40)*/?-8]. 


For Leibniz’ third application of the characteristic triangle, note that the 
similarity of the triangles in Figure 3 implies that 


Dp _ a _ 
amar’ or vdx=yda, 


where pv is the subnormal to the given curve. Hence 


[vax = fy a. (15) 


Leibniz noted that, if the given curve passes through the origin and its base 
interval is (0, 5], then the right-hand integral in (15) is simply the area 5b? 
of a triangle with base and height equal to b>—“straight lines that continu- 
ally increase from zero, when each is multiplied by its element of increase, 
form together a triangle.” 


Thus, to find the area of a given figure, another figure is sought such that 
its subnormals are respectively equal to the ordinates of the given figure, 
and then this second figure is the quadratrix of the given one; and thus 
from this extremely elegant consideration we obtain the reduction of the 
areas of surfaces described by rotation to plane quadratures [For- 
mula (13)], as well as the rectification of curves [Formula (14)]; at the 
same time we can reduce these quadratures of figures to an inverse 
problem of tangents [Formula (15)] (see [5], p. 41). 
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Since vp = y(dy /dx), Formula (15) says that 


fol S)ae= fora 


This was the first appearance of two ideas that were to play central roles in 
Leibniz’ calculus—the transformation of integrals by means of substitu- 
tions, and the reduction of quadrature problems to inverse tangent prob- 
lems, the latter being problems in which a curve is to be determined from a 
knowledge of its tangent line. 

To illustrate the way in which Formula (15) reduces quadratures to 
inverse tangent problems, suppose we want to find the area {5 x"dx under 
the curve z= x”. If we can find a curve y = f(x) with subnormal v= x”, 
then Formula (15) will yield 


[ox'ax = fv dy =[ py] = to”, 


x= 


assuming that f(0)=0. Trying y = bx“, we want 
y= jo = bx* - bkx*~! = b*kx**—! = x", 
dx 
This requires that 
k=1(n+1) and b=[}(n+1)] 


It follows that 


qtt! 


n+1- 


[oxtax = Fpgky = 
0 2 


In these investigations of 1673, his first year of serious work in mathe- 
matics, Leibniz obtained few if any results that were actually new, that is, 
no specific quadratures or rectifications that had not been discovered 
previously by others. Even his touchstone, the characteristic triangle, was 
implicit in Pascal’s work (and fairly explicit in Barrow’s Geometrical 
Lectures). But he took significant first steps towards his real goal—the 
development of a general algorithmic method that would unify the diverse 
results and techniques that he found in the existing mathematical litera- 
ture. Two decades later, in a letter to "Hospital, he summarized these first 
steps as follows ({5], pp. 220-222). 


[With the] use of what I call the “characteristic triangle”, formed from the 
elements of the coordinates and the curve, I thus found as it were in the 
twinkling of an eyelid nearly all the theorems that I afterward found in 
the works of Barrow and Gregory. Up to that time, I was not sufficiently 
versed in the calculus [algebra] of Descartes, and as yet did not make use 
of equations to express the nature of curved lines; but, on the advice of 
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Huygens, I set to work at it, and I was far from sorry that I did so: for it 
gave me the means almost immediately of finding my differential calcu- 
lus. This was as follows. I had for some time previously taken a pleasure 
in finding the sums of series of numbers, and for this I had made use of 
the well-known theorem, that, in a series decreasing to infinity, the first 
term is equal to the sum of all the differences. From this I had obtained 
what I call the “harmonic triangle,” as opposed to the “arithmetical 
triangle” of Pascal --- Recognizing from this the great utility of dif- 
ferences and seeing that by the calculus of M. Descartes the ordinates of 
the curve could be expressed numerically, I saw that to find quadratures 
or the sums of the ordinates was the same thing as to find an ordinate 
(that of the quadratrix), of which the difference is proportional to the 
given ordinate. I also recognized almost immediately that to find tangents 
is nothing else but to find differences, and that to find quadratures is 
nothing else but to find sums, provided that one supposes that the 
differences are incomparably small. 


Transmutation and the Arithmetical Quadrature of 
the Circle 


In late 1673 or early 1674 Leibniz discovered a general “transmutation” or 
transformation method with which he could derive essentially all of the 
previously known plane quadrature results. He described its advantages in 
his reply to Newton’s Epistola prior (see [9], pp. 65-66). 


My method is but a corollary of a general theory of transformations, by 
the help of which any given figure whatever, by whatever equation it may 
be accurately stated, is reduced to another analytically equivalent fig- 
ure... Furthermore, the general method of transformations itself seems 
to me proper to be counted among the most powerful methods of 
analysis, for not merely does it serve for infinite series and approxima- 
tions, but also for geometrical solutions and endless other things that are 
scarcely manageable otherwise: -- The basis of the transformation is 
this: that a given figure, with innumerable lines [ordinates] drawn in any 
way (provided they are drawn according to some rule or law), may be 
resolved into parts, and that the parts—or others equal to them—when 
reassembled in another position or another form compose another figure, 
equivalent to the former or of the same area even if the shape is quite 
different; whence in many ways the quadratures can be attained: - - 

These steps are such that they occur at once to anyone who proceeds 
methodically under the guidance of Nature herself; and they contain the 
true method of indivisibles as most generally conceived and, as far as I 
know, not hitherto expounded with sufficient generality. For not merely 
parallel and convergent straight lines, but any other lines also, straight or 
curved, that are constructed by a definite law can be applied to the 
resolution [of the original figure into parts that are to be reassembled to 
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form another figure]; but he who has grasped the universality of the 
method will judge how great and how abstruse are the results that can 
thence be obtained: For it is certain that all squarings hitherto known, 
whether absolute or hypothetical, are but limited specimens of this. 


Here Leibniz describes in very general terms the following principle, to 
which the term “transmutation” was applied during the seventeenth 
century. Let A and B be two plane (or space) regions, each subdivided into 
“indivisibles”, generally infinitely small rectangles (or prisms). If there is a 
one-to-one correspondence between the indivisibles in A and those in B, 
such that corresponding indivisibles have equal areas (or volumes), then it 
is said that B is derived from A by a “transmutation”, and we conclude 
that A and B have equal areas (or volumes). 

As we saw in Chapter 4, this principle was the basis for the computa- 
tions of Cavalieri and others who used rectangular indivisibles. It enabled 
them to accomplish (in a variety of special cases) what would be done now 
by means of changes of variable and integration by parts. Although he 
described the inherent possibilities somewhat more generally in his letter to 
Newton, Leibniz’ main innovation in practice was the use of triangular 
indivisibles in a systematic transformation process. 

Given neighboring points P(x, y) and Q(x + dx, y+ dy) on the curve 
y = f(x), x €[a, b], Leibniz considers the infinitesimal triangle OPQ, where 
O is the origin. Let the tangent line determined by the infinitesimal arc ds 
joining P and Q intersect the y-axis at the point 7(0, z) (see Fig. 5), where 

dy 
zZ=y-x7, (16) 
and denote by OS the perpendicular segment of length p from O to this 
tangent line (extended). Then the triangle OST is similar to the characteris- 
tic triangle PRQ, so it follows that dx/p=ds/z. Hence the area of the 


Figure 5 
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infinitesimal triangle OPQ 1s 
a(OPQ) = ip ds = 52 dx. (17) 


If we consider the sector OAB, bounded by the graph AB of y = f(x) 
and the radii OA and OB, as being subdivided into infinitesimal triangles 
like OPQ, then it follows from (17) that 


a(OAB) = sf dx, (18) 


a 


where z = g(x) is defined by (16). But 
J y ax = L0f(b) — Laf(a) + a(OAB) 
= 5 [>], + a(OAB), 


so it follows from (18) that 


fy dx = 5 (Lot fz dx). (19) 


Formula (19) is Leibniz’ “transmutation theorem.” Its significance (like 
that of Formula (15)) is that it established an inverse relationship between 
the tangent problem (since z is defined in terms of the tangent) and the 
quadrature problem (of computing {° y dx). Moreover, a new curve z= 
g(x) was introduced to serve as a “quadratrix” for the original curve 
y =f (x), in case {?z dx turned out to be a simpler integral in terms of 
which f{?y dx could be evaluated. Note also that the substitution of 
z= y — x(dy/dx) into (19) yields the integration by parts formula 


b b f(8) 
ydx =|xy|,- x dy. 
aia ad a 


EXERCISE 11. Consider the “higher parabola” y?= x”, g >p >0. Show that 


Ih _P _4-P 
vdeo ee SO Zz , y. 


Conclude from the transmutation formula that 


b 
P/Qdy = —4 [yy)o = 
[ox?/tdx = bole 


mage 


Leibniz’ most interesting application of the transmutation theorem was 
his so-called “arithmetical quadrature of the circle’—the derivation of the 
famous infinite series 
a 


7 foes (20) 


that now bears his name. 
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1 x 


Figure 6 


The upper half of the unit circle tangent to the y-axis at the origin 
(Fig. 6) is the graph of y= V2x — x? . Since 


dW _1-x 
dx yy? 
we find that 
bmg x 
y 2-x ’ 
or 
ee 227 
l+z2_ 


Leibniz then applies the transmutation formula to compute the area of the 
quarter-circle as follows. | 


T l 
4. byes 
= 5 ([xV2x—%2 bt [oz dx) (by (19)) 
2 0 Jo 
] 1 
=5[1+(1-f'x a) (Figure 7) 
ass -f' z°dz 
o 1+2z? 
= 1- f°2 —z?+z*-.--)dz (geometric series) 
0 
tg en D2 _ 
= ]-| =-2°-=27°+52'::-: (termwise integration) 
a 5 FG : 
Pe je tee Se 
4 3°5 7 


ignoring the question of convergence when z = 1. 
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Figure 7 


EXERCISE 12. Consider the portion of the rectangular hyperbola defined by 
y= Vx? + 2x , x >0. Show by Leibniz’ transmutation method, i.e., by a computa: 
tion analogous to his arithmetical quadrature of the circle, that the area of the 
shaded region in Figure 8 is 


x ——. ] 
f Vx? + 2x dx = (3-5 )29+ (5-2 )2+--- 
0 


Leibniz was intrigued by the comparison between the series 


z= (5-2)+(35-aa)+ (Gea) + 
8 \2 6 10 14 18 22 
3 35~—«99 
T l l ] 
$163 S77 92 ae 
y 
74 2x 
] 
Asymptote 


y=x+] 
‘ 


Figure 8 
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Figure 9 


and Mercator’s series in the form 


a6 2=4(1-5+5-Gt5- Ag 2 

Ze ANS FOB A SG ) 
-(d-t)-(-$)-(-A) 
~A\4 8 12 (a 24 

1 1 1 1 

7 ae ar a ea (a oa 


Another impressive accomplishment of the transmutation method was 
Leibniz’ quadrature of a general segment of a cycloid (the first of Pascal’s 
contest problems). Figure 9 shows half of an arch of the cycloid that is 
generated by a circle of radius a rolling along the vertical line x = 2a. The 
length y of the ordinate AQ of the typical point Q on the cycloid is given 
by 


yHETurts, 


where u = V 2ax — x” is the length of the ordinate AP of the correspond- 
ing point P on the generating circle, and s is the length of the circular arc 
OP. That is, the length of the segment PQ is equal to that of the arc OP; 
see Exercise 13 below for this standard property of the cycloid. 

The similarity of the characteristic triangle for the circle and the triangle 
ABP implies that 
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SO 
& _ du, ds _ da~x 
dx x ax u 
Therefore 
pail ye ad =(u+s)—u=s 


The transmutation theorem therefore gives 
x x x 
sax = zdx =2] ydx-—-x,y 
ey, ea eee, 
= 2f (u +s)dx — x,(u,+5,). (23) 
0 
Hence 
f 's dx = x,(u,+5,) — 2 f 'u ax. 
0 0 
But subtraction of the triangle ABP from the circular sector OBP gives 
i; ‘u ax = +as, —4u,(a—x,). 
0 
It follows that 
‘ 's dx = au, — S,(a— x}). (24) 
0 


Finally, the area of the cycloidal segment over the interval [0, x,] is, from 
(24) and (25), 


xy \ | ey aad | 
f y ax =iayitsf s dx 
0 2 Jo 
= 5X1), + pau, — 55,(a— x;). (25) 


For example, with x,;=2a, y,=7a, u,=0, s,=7a, Formula (25) gives 
32a /2 for the area of the half-arch shown in Figure 9. 


EXERCISE 13. Figure 10 shows the cycloid generated by a circle of radius a, with 
parametric equations 
x = a(t—sin ft), y = a(1—cos 2). 


Show that the length s of the segment PQ is equal to the length a(a— ft) of the 
circular arc OP. 
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Figure 10 


The Invention of the Analytical Calculus 


Leibniz recorded the invention of his analytical calculus in a series of 
somewhat disjointed notes that he wrote during late October and early 
November of 1675. We will refer to the English translations of these 
crucial notes provided by Child [5]. 

Given a curve described in terms of its abscissa x and ordinate y, 
Leibniz envisions a discrete sequence of infinitely many values of y 
associated with the corresponding sequence of values of x. The sequence of 
ordinates is in some way analogous to an ordinary sequence of numbers, 
and the abscissas (like subscripts) determine the order of this sequence. 
However, the difference between two successive values of y is assumed to 
be infinitesimal, or “negligible” compared with the y values themselves. 

At first Leibniz uses the letter ? to denote the infinitesimal difference 
between two successive values of y, and designates sums by writing omn. 
as an abbreviation of the Latin omnia. Thus, in the manuscript of October 
29, he starts with his previous result }y?= fy dy written in the form 

omn. e 
= omn. ——— —. 
2 omn. ¢ a 


(26) 


He uses the overbars in place of parentheses, and inserts the constant a= 1 
to preserve dimensional homogeneity. Thus (26) means 


(fa) = f(a) 


He remarks that “this is a very fine theorem, and one that is not at all 
obvious.” Continuing, he says 


Another theorem of the same kind is 
omn. xf = x omn. f — omn. omn. / (27) 


where (¢ is taken to be a term of a progression [of differences], and x is the 
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number which expresses the position or order of the ¢ corresponding to it; 
or x is the ordinal number and ( is the ordered thing. 


Thus he is now talking about a sequence of differences of ordinates. 


Equation (27) amounts to 
(S4)-Slf4) 


fra 
= xy — fy dx. 


In these early notes he often writes fy, not making it clear whether fy dx 
or fy dy is intended. 

It is actually at this point in the discussion that he introduces the 
integral symbol with the innocuous-looking remark 


It will be useful to write { for omn, so that { @=omn. (, or the sum of 
the ¢’s. Thus, 


,2 ===> 
JO pe, and [t= xfo-f fe. (28) 
He adds that “all these theorems are true for series in which the differences 
of the terms bear to the terms themselves a ratio that is less than any 
assignable quantity” [i.e., is infinitesimal]. 

Having introduced the symbol f (evidently an elongated S for “sum”), 
he proceeds to investigate its rules of operation. For example, with ¢ = dx 
in the first of Equations (28), he recovers {x dx = 5x7. Then, with f = x dx 
in the second of Equations (28), he obtains 


[idx = xf xdx—f (fx dx] 


[+ dx = +x, 


from which it follows that 


Actually, in the October 29 manuscript he writes (=y/d, which be- 
comes the now familiar dy three days later in his November 1 manuscript. 
The difference notation y/d first appears in his discussion of the inverse 
tangent problem: 


Given (, and its relation to x, to find f ¢. This is to be obtained from the 
contrary calculus, that is to say, suppose that ff=ya. Let (=ya/d. 
Then just as f will increase, so d will diminish the dimensions. But f 
means a sum, and d a difference. From the given y, we can always find 
y/d or ¢, that is, the difference of the y’s. Hence one equation may be 
transformed into the other. 
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In the manuscript of November 11, he poses the questions as to whether 
d(uv) =(du)(dv) and d(v/u)=(dv)/(du), and answers in the negative by 
noting that 


d(x?) = (x + dx) — x? = 2x dx + (dx) = 2x dx, 
ignoring the higher-order infinitesimal, while 
(dx)(dx) = (x + dx —x)(x+dx—x) = (dx)’. 


At this time Leibniz is still searching for the correct product and 
quotient rules. Nevertheless, he can already use his embryonic calculus to 
solve a non-trivial geometrical problem—to find the curve y = f(x) whose 
subnormal » is inversely proportional to its ordinate, that 1s, 


b 
p=, 
y 
He starts with his previous result 

f> dx = iy? (Formula (15)). 
Application of the inverse operator d gives 


y dx = df v dx = d(}y”) = y dy. 


Substitution of y= b/y then gives 


b 

— dx = 

; yd 

b dx=y? ay, 
fe dx = [ y? dy, 


SO 


is the equation of the desired curve. He proceeds to check this result by use 
of Sluse’s tangent rule (Chapter 5), thereby verifying in a non-trivial 
problem the validity of his calculus. 


EXERCISE 14. Show by differentiation that the curve bx = y?/3 has the subnormal 
property y= b/y. 


By July of 1676, (see [4], pp. 118-122) Leibniz consistently includes the 
differential under the integral sign. In a manuscript dated November 1676 
({5], pp. 124-127), he states clearly the rules for differentiation and integra- 
tion of powers, 


dx® = ex*~! dx and [xt ax = 


xet! 


e+1’ 
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with e not necessarily a positive integer. He adds the important remark 
that “this reasoning is general, and it does not depend upon what the 
progression for the x’s may be.” This is his way of saying that x may bea 
function of the independent variable, rather than the independent variable 
itself. This generality made possible the method of substitution for dif- 
ferentiating compositions of functions (i.e... what we now call the chain 
rule). 

For example, to compute dV a+ bz + cz? he substitutes x = a+ bz+ 
cz*. Noting that 


dVx = aa and dx = (b+2cz) dz, 
2Vx 


it follows that 


dV atbz+ cz? — _(b+2cz) dz 
2Vatbz+cz? 


Previously, Leibniz had accepted Sluse’s tangent rule without proof. In the 
November 1676 manuscript he shows how to derive it from his calculus. 
For example, given 


z=ay?>+byxtex*+fxtgth=0, (29) 
he substitutes x + dx for x and y + d@ for y, obtaining 


ay? + 2ay dy + a(dy) + byx + by dx + bx dy + b dx &y 
+ cx? + 2cx dx+c(dx/ +fxtfaxtgytgedth=0. 
By (29) and the assumption that 
a(dy) + b dx dy + c(dx) = 0, 
there remains 
2ay dy + by dx + bx dy + 2cx dx+fdx+gad =0, 
sO 


ad _ _ bytrext+f _ _ dz/dx 
dx 2ay+ bx+g dz /dy’ 


in agreement with Sluse’s rule. 

In a manuscript dated 11 July 1677, and in an undated revision of it ({5], 
pp. 134-144), Leibniz gives statements and proofs of the product and 
quotient rules. To show that 


d(xy) = xdty dx, 
he writes 
(x + dx)(y + dy) — xy 
=xadt+ydxt+dx dy, 
and remarks that “the omission of the quantity dx dy, which is infinitely 
small in comparison with the rest, for it is supposed that dx and dy are 


d(xy) 
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(x, y) 


Figure 11 


infinitely small, will leave x dy + y dx.” To show that 
he writes 


“which becomes (if we write x? for x? + x dx, since x dx can be omitted as 
being infinitely small in comparison with x?) equal to (x dy — y dx) /x?.” 
Leibniz was careful to verify whenever possible agreement between the 
results of his evolving analytical or operational calculus and the results of 
familiar geometrical arguments. For example, he noted that the product 
rule d(xy) =x dy + y dx agrees with the addition of areas in Figure 11, 


[xa t fy dx =, 


Similarly, addition of moments about the x- and y-axes, respectively, gives 


fi? de + fry db =}? 
and 
fry dct [ 5x? dy = 3x4. 


However, as he remarked in the Historia ({5], pp. 55-56), “the calculus also 
shows this without reference to any figure, for 4 d(x*y) = xy dx + $x? dy; 
so that now there is need for no greater number of the fine theorems of 
celebrated men for Archimedean geometry, than at most those given by 
Euclid in his Book II or elsewhere, for ordinary geometry.” The calculus 
has become a “sensible and palpable medium, which will guide the mind”! 

In the revised 1677 manuscript the role of the infinitesimal characteristic 
triangle is made explicit in the new calculus. A curve is now a polygon with 
infinitely many angles and infinitesimal sides. The arclength element ds is 
a side of this infinite-angled polygon—an infinitesimal straight line seg- 
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Figure 12 


ment joining two adjacent vertices, so 


ds = (dx) + (dy)? = Vi+(2) dx, 


where dx and dy are the differences of the x- and y-coordinates of these 


two adjacent vertices. Thus, for the parabola y = + x?, the arclength is given 
by 


s= fds = {Vi+x? dx, 


so the rectification of this parabola depends on the quadrature of the 
hyperbola y= V1 + x’. 


In this manuscript the integral fy dx 1s clearly identified with a sum of 
infinitesimal rectangles with heights y and width dx. Referring to Figure 
12, Leibniz says ({5], p. 13) 


I represent the area of a figure by the sum of all the rectangles contained 
by the ordinates and the differences of the abscissae, 1.e., by B,D, + B,D, 
+ B,D,+ etc. For the narrow triangles C,D,C,, C,D,C;, etc., since they 
are infinitely small compared with the said rectangles, may be omitted 
without risk; and thus I represent in my calculus the area of the figure by 


Jf » dx, or the rectangles contained by each y and the dx that corresponds 
to it. 


Next he introduces what we now call the fundamental theorem of 
calculus—“‘we, now mounting to greater heights, obtain the area of a 
figure by finding the figure of its summatrix or quadratrix.”’ Given a curve 
with ordinate z, whose area is sought, suppose it 1s possible to find a curve 
with ordinate y such that 


dy Zz 


dx a 


where a is a constant (presumably included for the sake of dimensional 
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homogeneity). Then 
zdx =ady, 


so the area under the original curve is 


[zdx=af a = ay, (30) 


assuming (as usual with Leibniz) that the y-curve passes through the origin. 
Thus quadrature problems reduce in Leibniz’ calculus to inverse tangent 
problems. That is, in order to find the area under the curve with ordinate 
z, it suffices to find a curve whose tangent satisfies the condition 


Y 

dx 
Subtracting the area over [0, a] from that over [0, 6], and setting a=1 in 
(30), it follows that 


2. 


[oz ax = yb) - y(a). 


The First Publication of the Calculus 


Leibniz’ first published article on his differential calculus appeared in 1684 
in the Leipzig periodical Acta Eruditorum. An English translation is in- 
cluded in Struik’s source book ({11], pp. 272-280). 

This first paper was entitled “A new method for maxima and minima as 
well as tangents, which is impeded neither by fractional nor by irrational 
quantities, and a remarkable type of calculus for this.” Differentials are 
introduced without much indication of the infinitesimal considerations that 
had been their motivation. Given an arbitrary number dx, dy is defined to 
be that number dy such that the ratio dy/dx is equal to the slope of the 
tangent line. By modern standards, this is not so bad, except that no real 
definition of the tangent line is supplied—“We have only to keep in mind 
that to find a tangent means to draw a line that connects two points of the 
curve at an infinitely small distance, or the continued side of a polygon 
with an infinite number of angles, which for us takes the place of the 
curve.” 

The mechanical rules for computing differentials of powers, products, 
and quotients are stated without any explanation of their source. It is 
pointed out that dv is positive when the ordinate wv increases with increas- 
ing x, while dv is negative when v is decreasing. Since “none of these cases 
happens - - - when v neither increases nor decreases, but is stationary,” the 
necessary condition dv =O for a maximum or minimum, corresponding to 
a horizontal tangent line, is noted. The necessary condition d(dv) =0 for 
an inflection point is explained similarly. 
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Figure 13 


As a first application of his max-min method, Leibniz solves the follow- 
ing problem. “Let two points C and E£ [Fig. 13] be given and a line SS in 
the same plane. It is required to find a point F on SS such that when E 
and C are connected with F the sum of the rectangle [product] of CF and a 
given line h# and the rectangle of FE and a given line r is as small as 
possible.” 


EXERCISE 15. With the notation indicated in Figure 13, the quantity to be mini- 


mized is 
w= hV (p—xy +c? +7rVx7?+e?. 


Apply the condition dw =0 to conclude that 


h(p — x) = rx 
Vip-xy te? Vx? +e? 
or 
sina _A 
sinB +r 


Leibniz interprets this result as the law of refraction for a light ray passing from a 
medium of density r (with respect to the velocity of light) into one of density h, the 
line SS representing the interface between the two media. He adds that “other very 
learned men have sought in many devious ways what someone versed in this 
calculus can accomplish in these lines as by magic.” 


“And this is only the beginning of much more sublime Geometry, 
pertaining to even the most difficult and most beautiful problems of 
applied mathematics, which without our differential calculus or something 
similar no one could attack with any such ease.” The 1684 paper concludes 
with the solution of a problem of De Beaune that Descartes had been 
unable to solve—to find the ordinate w of a curve whose subtangent rf is a 
constant, r= a. For such a curve (Fig. 14), 
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Figure 14 


or 
w= gee (31) 
dx - 
Leibniz considers a sequence of values of x with constant differences 


dx = b. Then 


ee 
a 


so the corresponding sequence of ordinates w is proportional to its 
sequence of differences. Knowing that this is the characteristic property of 
a geometrical progression (see Exercise 4), he concludes that “if the x form 
an arithmetic progression, then the w form a geometric progression. In 
other words, if the w are numbers, the x will be logarithms, so the [desired 
curve] is logarithmic.” 


EXERCISE 16. Integrate Equation (31) to show that w= e*/%, or x=a log w, if w=1 
when x = 0. 


EXERCISE 17. Sharpen Exercise 4 to prove that a series is geometric if and only if its 
terms are proportional to its differences. 


The integral and the symbol f first appeared in print in a paper 
published by Leibniz in the Acta Eruditorum of 1686 (see [(11], 
pp. 281-282), where he presented the result expressed by Equation (15). 
The fundamental theorem of calculus, with the proof discussed in the 
preceding section, appeared in the Acta Eruditorum of 1693 (see [11], 
pp. 282-284). 


Higher-Order Differentials 


We have seen that Leibniz’ infinitesimal calculus had its roots in a certain 
logical extrapolation—from the simple concepts of sum and difference 
sequences, for ordinary sequences of numbers, to the case of sequences of 
variables associated with a geometric curve. The curve is envisioned as an 
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infinite-angled polygon with infinitely many infinitesimal sides, each of 
which is coincident with a tangent line to the curve. The basic sequences of 
variables associated with the curve are then the sequences of abscissas x 
and ordinates y of the infinitely many vertices of this polygon. 

The difference of two successive values of x is the differential dx, and 
similarly for dy. It is assumed that the quantities dx and dy are non-zero 
but incomparably small, and therefore negligible, with respect to the values 
of the variables x and y. Similarly, it is assumed that a product of 
differentials, such as (dx)(dy) or (dx)’, is in turn negligible in comparison 
with the differentials dx and dy. On the basis of these assumptions, taken 
as operational rules, the standard differentiation formulas are derived. 

It 1s important to note that the differentials dx are fixed non-zero 
quantities; they are neither variables approaching zero nor ones that are 
intended to eventually approach zero. There is actually a sequence of 
differentials dx (or dy) associated with the curve—it is simply the difference 
sequence of the sequence of abscissas x (or ordinates y). This difference 
sequence in turn has a difference sequence whose elements are the secona- 
order differentials 


d(dx) = d dx = d’x. 


Similarly, d’y is the difference of successive differences of y values. By 
taking differences iteratively, the higher-order differentials d*x = d(d*~ 'x) 
and d‘y = d(d*— 'y) are obtained. 

It is assumed that d?y is incomparably small with respect to dy, and in 
general that d‘y is incomparably small with respect to d*~ 'y. In addition, 
it is assumed that a kth-order differential d*y is of the same order of 
magnitude as a kth power (dx)* of a first-order differential, in the sense 
that the quotient d‘y /(dx)* is a real number (except in singular cases). On 
the basis of these assumptions, the product and quotient rules can be used 
to compute differentials of differentials. For example, 


d(x dy) = (dx)(dy) + x dy, 


d(x") = d(nx"~'dx) 
= n(n—1)x"~*(dx) + nx"—'d2x, (32) 
ay) _ (dy)(dx) —(dx\(a) 
a( dx (dx) ?) 


EXERCISE 18. (a) Show that 
d?(uv) = ud*v + 2(du)(dv) + (d7u)v 
= (d°u)(d*v) + 2(du)(dv) + (d7u)(dv) 
where we write du =u and dp = v. 


(b) Use induction on 7 to prove “Leibniz’ rule” 


d"(uv) = > ( ) )(d?u)(d"~Pv). (34) 
= 
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In Leibnizian computations with differentials, the choice of x as the 
independent variable was effected by choosing the original sequence of x 
values or abscissas as an arithmetic progression, so that dx is constant and 
therefore d*x =0. Note that, with d*x =0, Formulas (32) and (33) become 


d?(x") = n(n- 1)x"-*(dx) and a >) = = 


from which it follows that 


pee = n(n—1)x"~? 
and 
ddy/dx) _ dy 
dx (dx)? 


EXERCISE 19. Assuming that d7x =0, show by induction on n that 


a( d"~y _ da 
(dx)"—! (dx)""! : 
so division by dx gives 
d{d™y \_ dy 
di ( (ay (ax) nae 


Whereas higher-order differentials, in contrast with first-order ones, no 
longer are with us in ordinary everyday calculus, their legacy survives in 
the notation 


d”y 
dx” 


(Formula (35) without parentheses) for the nth derivative of the function y 
of the independent variable x. 

In a problem where the choice of y as the independent variable was 
- indicated, the sequence of ordinates y (rather than abscissas x) was taken 
as an arithmetic progression, so that d*y=0 (instead of d*x=0). This 
choice was referred to as the “specification of the progression of the 
variables.” See the article by Bos ({2], pp. 25-35) for a discussion of the 
consequences of this choice. In particular, this freedom of choice was 
the basis for the method of integration by substitution. For example, if the 
sequence of abscissas x is taken to be a sequence of squares, one has the 
substitution x = 4”, dx = 2¢ dt, where ¢ is the new independent variable with 
d*t=0. As Leibniz himself put it, “in this way I can transform the given 
quadrature into others in an infinity of ways, and thus find the one by 
means of the other.” 
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Figure 15 


To illustrate the geometric applications of higher-order differentials, we 
include John Bernoulli’s derivation of a formula for the radius of curvature 
at a point on a curve (as described by Bos ((2], pp. 36-37). The radi OD 
and BD in Figure 15 are perpendicular to the curve AB, and intersect at 
the center of curvature D. The radius of curvature at B is r= BD, and the 
arclength differential is ds = BO. From the fact that triangle BHJ is similar 
to the characteristic triangle, it follows that 


AH = x+y. 


Taking x as the independent variable so that d7x =0, it follows that GH, 
the differential of AH, is given by 


GH = d(AH) = d{x+»2) 


2 
GH = dx+ (dy +ydy (36) 
dx 

Now the similarity of the triangles DGH and DCB gives the proportion 

BC _ BD 

HG HD’ eo 
in which 

2 2 

BC = (dx) + (dy) ; BD =r, 
dx 

and 


HD =r—-BH=r-~ (dx)" + (dyy’ 


dx 
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After substitution of these values and (36) into (37), the resulting equation 
is easily solved for 


_ _ [ay + oy" _ (ay 
(dx)(d’y) (dx)(d¥y) 


Division of the numerator and denominator in (38) by (dx) gives the 
familiar formula in terms of derivatives of y with respect to x, 


__ (dsfaxy _ [+ (a/axy 
|d’y /dx”| \dy/dx7| 


(38) 


The Meaning of Leibniz’ Infinitesimals 


In his publications on the calculus, Leibniz stressed the routine and formal 
character of his rules for the calculation and manipulation of differentials, 
and asserted that the proper application of these rules of operation would 
invariably lead to correct and meaningful results, even if uncertainty 
remained as to the precise meaning of the infinitesimals that appeared in 
the computations. Indeed, it was the correctness of the results obtained 
that had been his guide in the formulation of his algorithms, and had 
confirmed his confidence in their operational validity. 

Mathematical tradition generally attributes to Leibniz a belief in the 
actual existence of infinitesimal quantities—an infinitesimal quantity being 
one that is non-zero, yet smaller than every positive real number—and 
allegations to this effect are sometimes found in discussions of twentieth 
century “non-standard analysis” (see Chapter 12). Nevertheless, Leibniz 
seems not to have committed himself on the question of the actual 
existence of infinitesimals, and he certainly expressed doubts on occasion 
(e.g. see [3], p. 219). At any rate, he recognized that the question of the 
existence of infinitesimals is independent of the question as to whether 
computations with infinitesimals, carried out in accordance with the opera- 
tional rules of the calculus, lead to correct solutions of problems. Conse- 
quently, whether or not infinitesimals actually exist, they can serve as 
“fictions useful to abbreviate and to speak universally.” Leibniz gave a 
comprehensive statement of this point of view in an unpublished 
manuscript probably written sometime after 1700, in reply to criticisms of 
the calculus advanced in 1694 by the Dutch physician and geometer 
Bernard Nieuwentijdt ({5], pp. 149-150): 

Whether infinite extensions [quantities] successively greater and greater, 
or infinitely small ones successively less and less, are legitimate considera- 
tions, is a matter that I own to be possibly open to question; but for him 
who would discuss these matters, it is not necessary to fall back upon 
metaphysical controversies, such as the composition of the continuum, or 
to make geometrical matters depend thereon. - - - It will be sufficient if, 
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when we speak of infinitely great (or more strictly unlimited), or of 
infinitely small quantities (i.e., the very least of those within our knowl- 
edge), it is understood that we mean quantities that are indefinitely great 
or indefinitely small, i.e., as great as you please, or as small as you please, 
so that the error that any one may assign may be less than a certain 
assigned quantity. Also, since in general it will appear that, when any 
small error is assigned, it can be shown that it should be less, it follows 
that the error is absolutely nothing ... If any one wishes to understand 
these [the infinitely great and infinitely small] as the ultimate things, or as 
truly infinite, it can be done, and that too without falling back upon a 
controversy about the reality of extensions, or of infinite continuums in 
general, or of the infinitely small, ay, even though he think that such 
things are utterly impossible; it will be sufficient simply to make use of 
them as a tool that has advantages for the purpose of the calculation, just 
as the algebraists retain imaginary roots with great profit. For they 
contain a handy means of reckoning, as can manifestly be verified in 
every case in a rigorous manner by the method already stated. 


Thus Leibniz presents his calculus of infinitesimals as an abbreviated 
form of the rigorous Greek method of exhaustion, one whose more concise 
language is better adapted to the art of discovery. The basis for his 
argument is that, given an equality between two expressions involving 
differentials, that has been obtained by discarding higher-order differen- 
tials, it could have been established rigorously (and more tediously) by 
substituting for each differential the corresponding finite difference, and 
then proving that the difference between the resulting expressions could be 
made arbitrarily small by choosing the finite differences sufficiently small. 

Finally, it should be mentioned that whereas Leibniz himself was some- 
what circumspect regarding the actual existence of infinitesimals, this 
appropriate caution was generally not shared by his immediate followers 
(such as the Bernoulli brothers), who uncritically accepted infinitesimals as 
genuine mathematical entities. Indeed, this freedom from doubts about the 
foundations of the calculus probably promoted the rapid development of 
the subject and its applications. 


Leibniz and Newton 


In this chapter and the previous one we have detailed the separate 
approaches of Newton and Leibniz to the development of the calculus as a 
new and coherent mathematical discipline. It is instructive, finally, to 
compare and contrast their two approaches. 

Leibniz’ devotion to the advantages of appropriate notation was so 
wholehearted that one could ask whether he invented the calculus or 
merely a particularly felicitous system of notation for the calculus. Of 
course the answer is that he did both; indeed, his differential and integral 
notation so captured the essence of his calculus as to make notation and 
concept virtually inseparable. Newton, on the other hand, had little interest 
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in notational matters; neither suggestive nor consistent notation was of 
great importance to him. 

Leibniz’ constant goal was the formulation of general methods and 
algorithms that could serve to unify the treatment of diverse problems. 
General methods are certainly implicit in all of Newton’s work, but his 
greater enthusiasm for the solution of particular problems is evident. The 
difference is one of emphasis—Leibniz emphasizes general techniques that 
can be applied to specific problems, whereas Newton emphasizes concrete 
results that can be generalized. 

In regard to the calculus itself, discrete infinitesimal differences of 
geometric variables played the central role in Leibniz’ approach, while 
Newton’s fundamental concept was the fluxion or time rate of change, 
based on intuitive ideas of continuous motion. As a consequence, Leibniz’ 
notation and terminology effectively disguises the limit concept, which by 
contrast is fairly explicit in Newton’s calculus. 

For Leibniz, the separate differentials dx and dy are fundamental; their 
ratio dy /dx is “merely” a geometrically significant quotient. For Newton, 
however, especially in his later work, the derivative itself—as a ratio of 
fluxions or an “ultimate ratio of evanescent quantities” —is the heart of the 
matter. A second derivative is simply a fluxion of a fluxion, each fluxion 
involving only first-order infinitesimals, so Newton has no need of Leibniz’ 
higher-order infinitesimals. 

The integral of Newton is an indefinite integral, a fluent to be de- 
termined from its given fluxion; he solves area and volume problems by 
interpreting them as inverse rate of change problems. Leibniz’ integral, by 
contrast, is an infinite sum of differentials. Of course, both ultimately 
compute their integrals by the process of antidifferentiation; the computa- 
tional exploitation of the inverse relationship between quadrature and 
tangent problems was their key common contribution. 

Whereas Leibniz had only a peripheral interest in infinite series (apart 
from their contribution to his early motivation), the expansion of functions 
in power series was for Newton an everyday working tool that he always 
regarded as an indispensable part of his “method” of analysis. For exam- 
ple, Newton was happy to evaluate an integral or solve a differential 
equation in terms of an infinite series for its solution, but Leibniz always 
preferred a “closed form” solution. 

We have seen that Newton’s formative work on the calculus dated from 
1664-1666, while Leibniz’ analogous period was 1672-1676. However, 
Leibniz’ first publications on the calculus appeared in 1684 and 1686 (his 
Acta Eruditorum articles), whereas Newton, although he had shown 
manuscripts to colleagues in England, published nothing on the calculus 
until his Principia of 1687 and his Opticks of 1704 (with the De Quadratura 
as a mathematical appendix). 

Beginning in the late 1690’s Leibniz came under attack by followers of 
Newton who assumed that he had taken and used crucial suggestions 
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(without acknowledging credit to Newton) from the letters of 1676, and 
that he had learned of Newton’s work during his brief visits to London in 
1673 and 1676 (although he and Newton never met). Eventually, inferences 
became public charges of plagiarism. Leibniz in 1711 appealed for redress 
from the Royal Society of London (of which he was a member and 
Newton the president). The Royal Society appointed a commission which 
ruled in 1712, in a decision that was evidently stage-managed by Newton, 
that Leibniz was essentially guilty as charged. 

This unfortunate controversy had less to do with mathematics than with 
nationalistic rivalry between English and continental European mathemati- 
cians (see the article by Hofmann [8], pp. 164-165 for further details). Any 
serious study of the investigations of Newton and Leibniz makes it clear 
that their respective contributions were discovered independently. 

An irony of the English “victory” in the Newton—Leibniz dispute was 
that English mathematicians, in steadfastly following Newton and refusing 
to adopt Leibniz’ analytical methods, effectively closed themselves off 
from the mainstream of progress in mathematics for the next century. 
Although Newton’s spectacular applications of mathematics to scientific 
problems inspired much of the eighteenth century progress in mathematics, 
these advances came mainly at the hands of continental mathematicians 
using the analytical machinery of Leibniz’ calculus, rather than the 
methods of Newton. 
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Leonhard Euler (1707-1783) 


The eighteenth century was in mathematics a period of consolidation and 
exploitation of the great discoveries of the seventeenth century, and of 
their application to the investigation of scientific problems. The dominant 
figure of this period was Leonhard Euler, the most prolific mathematician 
of all time—his collected works amount to approximately seventy-five 
substantial volumes. The range and creativity of his fundamental contribu- 
tions, to all branches of both pure and applied mathematics, would 
perhaps justify Euler’s inclusion in the traditional short list—Archimedes, 
Newton, Gauss—of the incomparable giants of mathematics. 

Euler’s professional career was spent at the royal academies in St. 
Petersburg (1727-1741 and 1766~—1783) and Berlin (1741-1766). He was 
born and educated at Basel in Switzerland, where he completed his 
university education at the age of fifteen. Although his father, a clergyman 
who had studied mathematics under James Bernoulli, preferred a theologi- 
cal career for his son, young Euler learned mathematics from John 
Bernoulli, and thereby found his true vocation. 

The famous Bernoulli brothers—James (= Jacques = Jakob, 1654-1705) 
and John (=Jean=Johann, 1667—1748)—-were frequent correspondents 
with Leibniz and, after his 1684—86 calculus publications, equal collabora- 
tors with him in the initial development of the Leibnizian calculus. It was 
James Bernoulli who introduced the word “integral” in suggesting the 
name calculus integralis instead of Leibniz’ original calculus summatorius 
for the inverse of the calculus differentialis. 

John Bernoulli wrote during 1691—1692 two small unpublished treatises 
on the differential and integral calculus. Shortly thereafter he agreed to 
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teach the subject to the young Marquis de |’Hospital (1661—1704) and, in 
return for a regular salary, to communicate to I|’Hospital his own mathe- 
matical discoveries, to be used as the Marquis saw fit. The result was the 
publication by l’Hospital in 1696 of the first differential calculus textbook, 
entitled Analyse des infiniment petits pour Vintelligence des lignes courbes 
(Analysis of the Infinitely Small for the Understanding of Curves). The 
book opens with two definitions—‘“variable quantities are those that 
continually increase or decrease,” and “the infinitely small part whereby a 
variable quantity is continually increased or decreased is called the dif- 
ferential of that quantity”—-and two postulates—“two quantities, whose 
difference is an infinitely small quantity, may be taken (or used) indif- 
ferently for each other,” and “‘a curve may be considered as a polygon of 
an infinite number of sides, each of infinitely small length, which de- 
termine the curvature of the curve by the angles they make with each 
other.” On this basis the basic formulas for differentials of algebraic 
functions are derived, and applied to problems involving tangents, maxima 
and minima, and curvature. 

This first calculus text is now remembered mainly for its inclusion of a 
result of Bernoulli that is known as “l’Hospital’s rule” for indeterminate 
forms—if f(x) and g(x) are differentiable functions with f(a) = g(a) =0, 
then 


m 16) = fim LO) 
bas) g(x) x4 g’(x) 


provided that the right-hand limit exists. L’Hospital’s argument, which is 
stated verbally without functional notation (see the English translation 
included in Struik’s source book [12], pp. 313-316), amounts simply to the 
assertion that 


flatdx) _ flaj+fla) dx _ flayax _ f(a) 


g(a+dx) g(a)+g'(a)dx gi(a)dx  g'(a) 


provided that f(a)=g(a)=0. He concludes that, if the ordinate y of a 
given curve “is expressed by a fraction, the numerator and denominator of 
which do each of them become 0 when x =a,” then “if the differential of 
the numerator be found, and that is divided by the differential of the 
denominator, after having made x =a, we shall have the value of [the 
ordinate y when x = aj.” 

L’Hospital’s was the first printed textbook on the new calculus, but it 
was Euler’s two-volume Introductio in analysin infinitorum of 1748 that 
forged a new branch of mathematics—analysis, to stand alongside geome- 
try and algebra—from the concept of a function and infinite processes 
(such as summation of series) for the representation and investigation of 
functions. In the Jntroductio we find, for the first time, systematic treat- 
ments of logarithms as exponents and of the trigonometric functions as 
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numerical ratios rather than line segments, and a study of the functional 
properties of the elementary transcendental functions by means of their 
infinite series expansions. The Jntroductio is the earliest mathematics 
textbook that can be read with comparative ease by the modern student 
(even by one who has little knowledge of Latin). Euler’s notation and 
terminology seem almost “modern” for the simple reason that he originally 
introduced so much of the notation and terminology that is still used. 

Euler’s Introductio was followed by his Institutiones calculi differentialis 
of 1755 and the three-volume Jnstitutiones calculi integralis of 1768-1770. 
These great treatises on the differential and integral calculus provide the 
original source for much of the content and methods of modern courses 
and textbooks on calculus and differential equations. 


The Concept of a Function 


In modern mathematics courses a function from X to Y (where X and Y 
are sets of real or complex numbers) is defined to be a rule that assigns to 
each element x of the set X a unique element y=f(x) of the set Y. 
Sometimes the function f 1s defined in terms of the set of all pairs (x, f(x)), 
a subset of the Cartesian product set X x Y. 

Euler’s Introductio was the first work 1n which the function concept 
played a central and explicit role. It was the identification of functions, 
rather than curves, as the principal objects of study, that permitted the 
arithmetization of geometry, and the consequent separation of infinitesi- 
mal analysis from geometry proper. 

In seventeenth century infinitesimal analysis geometrical curves were the 
principal objects of study, and this study was carried out largely within the 
framework of Cartesian geometry. The variables associated with a particu- 
lar curve were exclusively geom-trical quantities—abscissas, ordinates, 
subtangents and subnormals, arclengths of segments of the curve, areas 
between the curve and the coordinate axes, etc. Relationships between 
these quantities were often described by means of equations, except in the 
case of transcendental relationships—ones that “transcended” description 
by means of algebraic equations; these had to be described in terms of 
verbal descriptions of geometrical constructions. However, these geometri- 
cal variables -were viewed primarily as being associated with the curve 
itself, rather than with each other. 

In particular, the several variables associated with a curve were not 
generally viewed as depending upon some single “independent” variable. 
A partial exception was Newton’s fluxional approach in which all of the 
geometrical variables were regarded (in effect) as functions of time. In- 
deed, Newton could force upon a given variable the role of independent 
variable by choosing it to play the role of the time variable. As he put it in 
Methods of Series and Fluxions ({NP UI], p. 73), 
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We can, however, have no estimate of time except in so far as it is 
expounded and measured by an equable local motion, and furthermore 
quantities of the same kind alone, and so also their speeds of increase and 
decrease, may be compared one with another. For these reasons IJ shall, in 
what follows, have no regard to time, formally so considered, but from 
quantities propounded which are of the same kind shall suppose some one 
to increase with an equable flow: to this all the others may be referred as 
though it were time, and so by analogy the name of ‘time’ may not 
improperly be conferred upon it. And so whenever in the following you 
meet with the word ‘time’ (as I have, for clarity’s and distinction’s sake, 
on occasion woven it into my text), by that name should be understood 
not time formally considered but that other quantity through whose 
equable increase or flow time is expounded and measured. 


Nevertheless, Newton’s view in practice remained essentially geometric 
and kinematic rather than functional in character. 

Leibniz introduced the word “function” into mathematics precisely as a 
term designating the various geometrical quantities associated with a 
curve; they were the “functions” of the curve. Then, as increased emphasis 
was placed on the formulas and equations relating the functions of a curve, 
attention came naturally to be focused on their roles as the symbols 
appearing in these equations, that is, as variables depending only on the 
values of other variables and constants in equations (and thus no longer 
depending explicitly on the original curve). This gradual shift of emphasis 
led ultimately to the definition of a function given by Euler at the 
beginning of the Introductio ([5], p. 185, § 4). 


A function of variable quantity is an analytical expression composed in 
any way from this variable quantity and from numbers or constant 
quantities. 


Euler’s admissible operations for “composing analytical expressions” were 
the standard algebraic operations (including the solution of algebraic 
equations) and various enumerated transcendental processes, including 
taking limits of sequences, sums of infinite series, infinite products, etc. On 
this basis his arithmetization of infinitesimal analysis was so complete that 
no pictures or drawings appear in Volume I of his Introductio (Volume II 
deals with analytic geometry) nor in his calculus treatises. 

Euler later gave (in the preface to his Institutiones calculi differentialis [6], 
p. 4) a still broader definition that is virtually equivalent to modern 
definitions of functions. 


If some quantities so depend on other quantities that if the latter are 
changed the former undergo change, then the former quantities are called 
functions of the latter. This denomination is of broadest nature and 
comprises every method by means of which one quantity could be 
determined by others. If, therefore, x denotes a variable quantity, then all 
quantities which depend upon x in any way or are determined by it are 
called functions of it (translation quoted from [14], p. 70). 
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The article by Youschkevitch [14] provides a comprehensive discussion 
of the development of functional concepts from ancient to recent times. 
See also the introduction to the article by Bos cited in the references to 
Chapter 9. 


Euler’s Exponential and Logarithmic Functions 


Euler investigates the exponential and logarithmic functions in Chapter 
VII of the Jntroductio ((5], pp. 122-132). In this section we outline his 
approach, emphasizing his development of infinite series expansions for 
these functions. 

Euler unhesitatingly accepts the existence of both infinitely small and 
infinitely large numbers, and uses them to such remarkable advantage that 
the modern reader’s own hesitation must be tinged with envy. His typical 
argument involves an infinitely small number w and an infinitely large 
number i, which in this exposition we will replace by « and N, respectively, 
thereby reserving i for the imaginary unit V — 1 . Euler did not introduce 
the now-standard notation i= V— 1 until late in his career. Finally, we 
will write x for Euler’s usual independent variable z. 

In Chapter VI ((5], p. 106) Euler has introduced the logarithm of x with 
base a, log,x (he writes simply /x), as that exponent y such that a” =x. 
This was the first historical appearance of logarithms interpreted explicitly 
as exponents. At the beginning of Chapter VII, noting that a°= 1, he writes 


at =1+ke (1) 


for an infinitely small number e. It will turn out that k is a constant 
depending on a. 


EXERCISE 1. Interpreting k as lim,_,(1/e)(a‘—1), explain why k=log,a. Hint: 
Interpret this limit as the value of the derivative of a~ when x =0. 


Given a (finite) number x, Euler introduces the infinitely large number 
N=x/e. Then 


qr = ge _ (ac) 


=(1+ke)” (by Eq. (1)) 


kx x 2 
= (1+) (2) 
_ kx N(N-1) (kx \? 
= 1+ (5) + : (| 
os = 3 
+ ee +--+ (binomial series) 
3! N 
2! 2 3! N?> 


(3) 
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Because N is infinitely large, he assumes that 


Consequently Equation (3) becomes 


kx kx? — kx? 


iad oa Tea Tada 2 
Substituting x = 1, he obtains the relationship between a and k, 
k k? k 
Gime Naty Pigg Pag (5) 


Euler now introduces his famous number e as the value of a for which 
k=1, 


ee ee ee 
e= Ta Por an 


He immediately identifies e as the base for natural or hyperbolic loga- 
rithms, and writes out its decimal expansion to 23 places, 


e = 2.71828182845904523536028. 
Equation (2) then gives 


e~= (1+), (6) 


which we may interpret as 


SO 
= lim (1+4) 7 
= jim (145) o 
the usual modern definition. With k = 1, Equation (4) finally gives 
7 ee ae 
e Sa hope ay he (8) 


Turning to the logarithm, Euler writes 
l+y =at* =a™ =(1+ke)”, 
so log,(1 + y)= Ne. Then 
l+ke =(1+y)'”, 
so «=((1+y)!/% — 1)/k, and it follows that 


N | 
log (l+y) = Ne = mae. +y)/" —1]. (9) 
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Replacing a with e (so k= 1) and y with x, we obtain 
log(1+ x) = N[(1+ x)" -1], (10) 
which we may interpret as 


log(1 + x) = tim. n[ (1 + x)!" 1]. (11) 


EXERCISE 2. With a= 1+ x and n=1/h, Equation (11) becomes 


. a] 
log a = lim : 
h—0 h 


Explain why this limit follows from the computation in Exercise |. 


Euler obtains Mercator’s series for log(1 +x) by using the binomial 
series to expand the (1+ x)!/% in (10). 


wn), why we?) 
—|{—-—]1 —|{—-—l}|—-2 
Aen ey Be ee ga ei i rape ee 


N 2! 3! 
Z ] 1 N-1 ,, 1 (N-1)QN-1) , 
=I+7;% 7 N20 13 N3 x” + 
Setting 
N-1 2N-1 
aera = |, N = 2, etc., 


because AN is infinitely large, he obtains 


log(1+ x) = N[(1+x)'/"—-1] 
1N-1 ,.1(N-1)Q2N-1) , 
a nel me Va 1c | eee 7 ean —. be eS 
log(1+x) = x —3x74+4Gx°----. (12) 


EXERCISE 3. Replace x with — x in (12), and then subtract logarithms to obtain 


logs a= Ax+ z+ Ext a ) (13) 


EXERCISE 4. Note that, if the k in Equation (9) is carried through, the derivation of 
equation (12) gives 


log,(1+y) = gle-gxttge gst tee ) 


2 3 
On page 127 Euler substitutes a = 10 and y =9 to obtain the value 
9 9 9% 
k= es adr anee aaa ve 


for base 10. Do you believe this? 
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Euler’s Trigonometric Functions and Expansions 


Prior to Euler, sines and cosines were lengths of line segments relative to a 
circle of given radius R. The sine of the angle A was half the chord of the 
circle subtended by a central angle 2A, and the cosine of A was the length 
of the perpendicular from the center to this chord. Thus, with R = 10000, 
sin 30° = 5000.00 and cos 30° = 8660.25. 

In Chapter VIII of the Introductio ({5], pp. 133-152), Euler defined (and 
standardized) the trigonometric functions as follows: sin x and cos x de- 
note the sine and cosine of the central angle in a unit circle that subtends 
an arc of length x. This amounts to saying that sin x and cos x are the sine 
and cosine of an angle of x radians in a circle of radius one. The 
fundamental identity 


sin?x + cos?x = 1 


follows at once. Euler immediately noted the periodicity properties of the 
sine and cosine, and proceeded to list the standard formulas, e.g., 


sin(y +z) = siny cosz+cosy sinz 
cos(y +z) = cos y cos z ¥ sin y sin z (14) 


in precisely the forms that trigonometry textbooks have included them ever 
since. Next he indicated (on page 140) the inductive derivation of “De 
Moivre’s identity” 


(cos z+i sin z)” = cos nz + i sin nz, (15) 


where i= V—1 and 7 is a positive integer. 


EXERCISE 5. Prove De Moivre’s identity by induction on n. 


Now let € be an infinitely small number and WN an infinitely large integer 
(!). The two choices of signs in (15) give 


cos Ne + i sin Ne = (cos €+i sin €) 
and 


cos Ne — i sin Ne = (cos €—i sine)”. 
Addition and subtraction then gives 
1 ee N i Je N 
cos Ne = 5 | (cos e+isine) +(cos €—i sine) | 
and 
, ai sae 
sin Ne = 5; [(cos +i sin e)” — (cos «—i sin e)™]. (16) 


Euler expands the right-hand sides of these equations using the binomial 
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series to obtain 


cos Ne = cos‘e — MN) cos ~%e sine 
+ RNA DNA MN 9) co st—4 € sin‘e + «0 e 
and 
sin Ne = N cos’ ~'e sin € — NN?) cost ~e sin*e 
+ MN NA NA ND cash % sin-e + see 
Finally, writing Ne =x, and substituting cose=1, sine=e, N=N—1= 
N—2=--- because ¢ is infinitely small and MN is infinitely large, he 
obtains the trigonometric series 
2 4 
x x 
cosx=1-Z-+3 7 — ce (17) 
and 
3 5 
; x x 
sinx=x—-Grtapo ccc. (18) 


Euler obtains his famous relation between the exponential and trigo- 
nometric functions by substituting «= x/N into Equations (16), obtaining 


cos ate (142) "+(1-S)" 
ae N N 


sin ms (1+2)"- (1-2) 
waaay F N N 


because cos e = 1 and sine =e = x/N. But remember that 


and 


Zz 1 Zz af 
These formulas then say 
cos x = <<*— (19) 
and 
ii = (20) 


EXERCISE 6. Deduce from (19) and (20) that 


+ 


e~* =cosx +isin x. (21) 
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The following exercises outline Euler’s derivation ((5], pp. 148-150) of 
Gregory’s inverse tangent series. 


EXERCISE 7. Substitute x = z/N (where z is finite) into (20) to obtain 


= = ay [ (ety ¥ ~ (ee). 


Remembering that log y = N(y1/* 


— 1), so 
y/N = + “logy, 
conclude that 
iz —iz 
z= 5; Lloste ) — log(e~”)]. 


EXERCISE 8. Substitute Euler’s relation (21) into the result of Exercise 7 to deduce 
the identity 


‘lege ee (22) 


EXERCISE 9. Substitute the logarithmic series (13) on the right-hand side of (22), 
with x =i tan z, to obtain 


3 5 
tan°z tan-z 
+ =. 6 er Je 


z= tanz — 


3 5 
With z=tan~ "4, this is 
3 5 
= t t 
eee geen Se deena 
tan tT t 3 5 
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Not the least interesting feature of Euler’s treatment of elementary tran- 
scendental functions in the Jntroductio is the fact that he derives their 
infinite series expansions without any use of calculus. In the Calculi 
Differentialis he then used these expansions to derive the Leibnizian 
differentials of the elementary functions. 

His approach is simply to delete all higher-order infinitesimals 
(dx)’, (dx),..., in an appropriate expansion of the differential dy of a 
given function y of x. For example, if y = x”, then the binomial expansion 
gives 


dy = (x+ dx)” — x” 
(2" + nx"~ 'dx +in(n— l)x"~*dx?+--- )- x" 


nx"—'dx + 3n(n—1)x"~*dx? + --- 


dy = nx"—'dx. 
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The product rule is derived in the usual way, 
d(pq) = (p+ dp)(q+ dq) — pq 
= pdq-+ q dp + dp dq = pdq+q 4p. 


To derive the quotient rule ({6], p. 109), Euler first expands 1/(q + dq) as a 
geometric series, 


2 
~i(:-444_...] 
q+dq 4q q @q 

2 

q 2 q° 
_1 4 
q q 


Then 


dp pdq_ dp dq 


q @ q° 


To differentiate the logarithm ({6], p. 122), Euler writes 
d(log x) = log(x + dx) — log x 


dx 
log{ + =) 


dx dx? dx? 
= ———_ + ——_ - --- by Eq. (12 
a yo aes (by Eq. (12)) 
d(log x) = = (23) 


As an example of Euler’s occasional flights of fancy, here is an alternative 
derivation of (23) that he gives. From Equation (11) he writes 


xt—] 


log x = 
€ 


where e€ is an infinitely small number. The power rule then gives 


ex *~ !dx x ‘dx ax 
d(log x) = 7 ek = = ae = er 


because x‘ = x°= 1 since e€ is infinitely small! 
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He uses the exponential series (8) to differentiate e”, 
d(e*) = exta& —~ert= e*(e* — 1) 
: dx? dx? 
e (acs 5-4 9-4 oe 
e“dx. (24) 


d(e*) 


EXERCISE 10. Given functions p and gq of x, differentiate y=p’ as follows. First 
expand 


y + dy = (pt ap)“ 
by the binomial series to obtain 


dy = p4(p“ —1) + (q+ dq)p?** "ap. (+) 
Then expand p™ — 1 = e@°8?) 7 — | by the exponential series to obtain 
p“ — 1 = (log p) dq. 


Finally substitute this into (+) to obtain 


d(p?) = p%(log p) dq + gp?” ‘dp. 


To differentiate sin x ([6], pp. 137-138), Euler uses the trigonometric 
series (17) and (18). 


d(sin x) = sin(x + dx) — sin x 
= sin x cos dx + cos x sin dx — sin x 
dx? dx* dx? 
= (sin »(-S-+4r+ acl + (cos (ax -S+ ce 
d(sin x) = cos x dx. (25) 


EXERCISE 11. Show similarly that d(cos x) = — sin x dx. 


EXERCISE 12. Instead of using the quotient rule to differentiate tan x =sin x/cos x, 
Euler on page 139 starts with 


tan x + tan dx 
1 —tan x tan dx’ 


using the addition formula for the tangent. Carry through this approach to obtain 
d(tan x) = sec? x dx. Hint: tan dx =sin dx /cos dx = dx. 


tan(x + dx) = 


To differentiate the inverse sine, y = sin™ !x, Euler ((6], p. 132) starts with 
e” = cosy +isiny 
=V1l—x? +ix, (Eq. (21)) 


whence 


1 
y= jlos(/1 — x? + ix). 
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It follows that 


_1—-xd — x?) ax + i dx 


¥ i 
1—x* +ix 
= avis x? —x dx 
V1i- x +ix VI—- x? 
Ee ees (26) 
1— x? 
If y=tan~ ‘x, then sin y= x/V1+ x’, so it follows that 
d(tan~'x) = a{sin-'—=_— } 
V1+ x? 
EXERCISE 13. Apply Eq. (26) to conclude that 
d(tan~'x) = (27) 


1+x? 
This is how Euler does it ({6], pp. 133-134). 


EXERCISE 14. Write Equation (22) in the form 


ly a2 de 1+ ix 
25 °F Tix’ 


and then differentiate the right-hand side to obtain (27). 


tan 


EXERCISE 15. Derive Equation (27) by termwise differentiation of Gregory’s inverse 
tangent series, using the geometric series to recognize the result. 


Euler’s admission of complex numbers on an equal footing with real 
numbers as functional arguments and values was a significant step. In 
particular, his slick use of complex logarithms in the preceding differential 
computations deserves comment. During the early eighteenth century there 
was a dispute between Leibniz and John Bernoulli over the meaning and 
existence of logarithms of negative and imaginary numbers. Euler sought 
to settle this dispute in a 1749 paper entitled “De la controverse entre Mrs. 
Leibniz et Bernoulli sur les logarithmes des nombres negatifs et im- 
aginaires” ([7], pp. 195-232). 

On page 210 he asserts that every number x has infinitely many 
logarithms. His argument is that, since 


log x = Nx'/% —N_ (Eq. (10)) 


for N infinitely large, the quantity nx!/"—n approaches log x as the 
positive integer n increases without bound. But each number x has n 
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; n 
distinct nth roots, or values for x'!/". For example, if Vx denotes the 
ordinary positive nth root of the positive number x, then 


(Vx e2nki/n\” ee etme ey 


for every integer k, because e?*” = cos 2k +i sin 2kx = 1. Thus x has the 
n distinct nth roots 


Vx eti/n ok = 01,2,...,n—-1. 


Euler concludes that x!/" should have infinitely many values if N is 


infinitely large, thereby producing infinitely many values of log x. For 
example, he points out (page 213) that, if log a denotes the ordinary real 
logarithm of the positive number a, then each of the infinitely many values 


(log a) + 2aki, k= 0-21 2 hx 
is a logarithm of a, because 


exp[ (log a) + 27ki] = e!°8%e7"" = a. 


EXERCISE 16. Show that i(7/2+ 2k) is a logarithm of i= V — 1 for each integer 
k. Conclude that 


gt 


i? = ef los! = e—7/2 = 0.20788. 
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We have seen that infinite power series expansions of particular functions 
played a central role in the analysis of Euler (as well as in the work of 
some of his predecessors, especially including Newton). Early in the 
eighteenth century it was discovered by several people that the series 
expansions of the various elementary transcendental functions are all 
special cases of the general expansion that is now called Taylor’s series 
(which will be discussed in the following section). The discovery of this 
quite general approach to infinite series expansions was closely associated 
with the development of interpolation methods. 

The construction of mathematical tables (such as tables of logarithms or 
trigonometric functions) during the seventeenth century focused attention 
on the problem of accurately interpolating between tabulated values. The 
goal was to lessen the considerable labor of constructing a table (e.g., of 
logarithms) by directly computing only a limited number of values, after- 
wards filling in the remaining entries by interpolation between these 
directly computed values. 

For example, suppose we want to construct a table of S-place natural 
logarithms of numbers between | and 100, at intervals of 0.1. We saw in 
Chapter 6 (in the section on Newton’s logarithmic computations) that a list 
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of the logarithms of the first several prime integers can be built up on the 
basis of comparatively few direct logarithmic computations using Merca- 
tor’s series. Once the logarithms of the 25 primes less than 100 have been 
found in this way, the logarithms of the remaining integers up to 100 can 
be obtained by addition (e.g., log 40 =3 log 2 + log 5). It then remains only 
to interpolate 9 logarithms between each pair of logarithms of successive 
integers. 

Unfortunately, the familiar process of /inear interpolation is not 
sufficiently accurate for this purpose. For example, log 40 = 3.68888 and 
log 41 = 3.71357, so linear interpolation gives 


log 40.4 = 3.68888 + (0.4)(3.71357 — 3.68888) = 3.69876, 


whereas actually log 40.4 = 3.69883. Thus two decimal places of accuracy 
have been lost in the process of linear interpolation. 

It is convenient to describe linear interpolation as follows. Given the 
values yo=f(xo) and y,=f(x,), where x,—x)j=Ax, consider the first 
difference Ay y=, — Yo. Then linear interpolation is defined by 


f (Xp + SAx) = yo + SAYo. (28) 
Procedures for more accurate interpolations go back at least to Henry 
_ Briggs’ computations of common logarithms around 1620. Given function 
values Yo = f (Xo), ¥1 = F(X), ¥2 =F (Xz) with x; — Xp = x2 — x, =Ax, he used 
both the first differences Ayy=y,— yo, Ay; =y,—y, and the second dif- 
ference 
A’yo = Ay, — Ayo = ¥2 — 2140 
His interpolation for f(x) + sAx) was given by 
f (Xo + SAX) = Yo + SAY + $5(5 — LAV. (29) 
For example, the values tabulated below yield 


log 40.4 = 3.68888 + (0.4)(0.02469) + 1(0.4)(—0.6)(— 0.00059) 


= 3.69883 


which is accurate to five places. An excellent discussion of Briggs’ com- 
putations can be found in Goldstine’s book on the history of numerical 
analysis ((10], pp. 13-30). 


i x; yi Ay; Ay; 

0 40 3.68888 
0.02469 

1 41 3.71357 — 0.00059 
0.02410 

2 42 3.73767 


EXERCISE 17. (Cf. [10], p. 27). Show that Formula (29) gives the following interpola- 
tion of 9 values between yp =f (x) and y, =f (x)). 
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x y 
Xo Yo 

Xp + (0.1)Ax yor (0.1)¥9— (0.045) Ay, 
Xo + (0.2)Ax Yo + (0.2)y¥9 — (0.080)Ay, 
Xo + (0.3)Ax Yot (0.3)¥9 — (0.105) Avg 
Xo+ (0.4)Ax Yot (0.4) yo — (0.120)A4y5 
Xo + (0.5)Ax Yot (0.5) 9 — (0.125) Avg 
Xo + (0.6)Ax Yo+ (0.6)¥9 — (0.120)A*y5 
Xo + (0.7)Ax Yo + (0.7) v9 — (0.105) Ayo 
Xo + (0.8)Ax Yot (0.8)y¥9 — (0.080)Av, 
Xo + (0.9)Ax Yot (0.9)y9 — (0.045) Ay 
xy yi 


A generalization of Formulas (28) and (29) that is now known as 
Newton’s forward-difference formula was stated without proof by Newton 
under Lemma V of Book III of the Principia Mathematica. It refers to 
interpolation between the values yp, y,,...,y, Of a function f(x) given at 
n+ 1 equally spaced points xp, x,,...,x,- For a concise statement of this 
formula, it is convenient to extend the difference notation (which neither 
Briggs nor Newton used) to higher order differences by defining Ay =y ,, 


— y, and 
Arr ly, = A(AS,) = 45,41 — 4%, 
recursively. For example, 
A’y, = Ay, — Ay, = (93-2) — (2-1) 
=y3;—2y,+y, 


Avy = AY, — AX 
= (y3— 2y2 t+ ¥1) — (¥2- 291 + Y) 
= y3 — 3y, + 3y; — Yo. 


and 


These successive differences are commonly tabulated as in the array shown 
below. Each entry is the difference of the two entries immediately to its 
left. 


Yo 
Ayo 
Jy A*Vo 
Ay, x, A*yo 4 
y2 A’y, A’ 
Ay, A*y, A*Vo 
3 A*y, A*, 
Ay, A*y, 
Ya AY; 
Ay, 


V5 
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EXERCISE 18. Show by induction on k that 
ifk 
A*y, = > (-1) ( ee 
i=0 
= Vi+j ~ Kes j-1 + yk(k— )¥n4j-2 pe Users 1)*y,, 


where (*) is the usual binomial coefficient. 


In this difference notation, Newton’s forward-difference formula for the 
interpolated value f(x) + sAx), where Ax = x,, ,— X;, is 


f(x + sx) = > (| )divo 


iO 
s(s—1 s(s—1)(s —2 


Fe s(s—1)- ++ (s—n+1) 


+ ses nl 


A’Yo. (30) 
The same formula (except for notation) had been stated (also without 
proof) by James Gregory in a letter to John Collins dated 23 November 
1670 (see Newton’s correspondence [NC I], p. 46). The first published 
derivation of an interpolation formula equivalent to (30) appeared in 
Newton’s Methodus Differentialis of 1711 (for an English translation see 
Newton’s works [NW II], pp. 165-173). However, Newton’s pioneering 
investigations of finite difference interpolation, including divided dif- 
ferences and the so-called Newton-Stirling and Newton—Bessel central-dif- 
ference formulas, were carried out in 1675-1676; this work is presented by 
Whiteside in Newton’s mathematical papers ((NP IV], pp. 3-73). 


EXERCISE 19. Given the following five values of the function y=.x°, set up a 
difference array to compute Ayp = 61000, A*vy = 30000, Ay) = 6000, A4y) = 0. Sub- 
stitute these values with s =0.4 into (30) to calculate 44° = 85184. 


i 0 1 2 3 4 
x, 40 50 60 70 80 
y, 64000 125000 216000 343000 512000 


In a 1670 manuscript that is discussed in the papers by Gibson ({9], 
pp. 4-5), Turnbull ((13], pp. 162-164), and Dehn and Hellinger ({4], 
pp. 152-153), Gregory apparently derived the binomial expansion from his 
interpolation formula in the manner indicated by the following exercise. 


EXxeERcIsE 20. Consider the function f(x)=(1+a)*. Let x,=j=0, 1, 2,...,m, so 
Ax =1 and y, =(1+ a). Show by induction on & that 


A*ty, = ay, 
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for each k=0, 1, 2,..., 7”. Substitute A*v,= a“ into (30) to obtain 


(s~1) 24 a 4 Ss—1)- --(s—ntl1) 


(l+a)’ = l+sa+-— oT = a”, 


the first n +1 terms of the binomial expansion of (1+ a)’. 


The above-mentioned Principia lemma reads, “To find a curved line of 
the parabolic kind [i.e., a polynomial] which shall pass through any given 
number of points.” This is the basic idea of Gregory—Newton interpolation 
—to determine a polynomial 


P(x) = dg + a,x + ayx7 +--+ +a,x" 


n 


of degree n which agrees with f(x) at the n+1 equally spaced points 
Xo X1,-.-,X,, that is, p(x;)=f(x,)=y; for i=0, 1,...,”. Then, for any 
intermediate point x, the value p(x) can be used as an interpolated 
approximation to the value f(x). 


It is somewhat easier to solve for the coefficients if p(x) is written in the 
form 
P(x) = Ag + Ay(X — Xo) + A2(x — Xq)(x — x1) 
+++ +A,(%— Xo)(X— x1) ++ (4X — X-1)- (31) 


If x — xp = sAx, then 
x— x, = (s—1)Ax, (x — x) = (s —2)Ax, etc., 
so (31) becomes 


P(Xo+ sAx) = By t+ Bs + Bs(s—1) +--+ +B s(s—1)--- (s—n+)), 
(32) 


where B, = A,(Ax)*. In order to obtain the interpolation formula (30) it 
suffices to show that 


k=0,1,...,7. (33) 


The requirement that p(x) + kKAx)=y, for each k=0,1,...,7 gives the 
equations 


Yo = Bo 
y, = Bot B, 


~ 
wd 
ll 


= By + nB, + n(n—1)B,+ n(n—1)(n—2)B, +--+ +n!B,. (34) 


SS ass 
| 
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EXERCISE 21. Solve the first four equations in system (34) to obtain 


Bo = Yo B, = Ayo, B= B; = —= 


The solution of system (34) is completed by induction. Assuming that 
B, = A‘y,/k! for k <n, the last equation in (34) gives 


y, =Yot nhyy + MD 3, 4 --+ +nA"~ y+ n!B.,. 


Since y,; =p t+ Aygo = (1 + Ado, ¥. 4. =, + AY, = C1 + Ady, 1t follows easily 
by induction on n that 


yn, = (1+ A)"yo 


n(n— 1 as 
= yo t ny, + MAD ayy, + - ++ +n", + Ap. 


Finally, comparison of the last two equations shows that B, =A’y,/n!, as 
desired. This establishes (33), and thereby completes the derivation of the 
Gregory—Newton interpolation formula. 

As a corollary to his interpolation lemma in the Principia, Newton wrote 


Hence the areas of all curves may be nearly found; for if some number of 
points of the curve to be squared are found, and a parabola [polynomial] 
be supposed to be drawn through those points, the area of this parabola 
will be nearly the same with the area of the curvilinear figure proposed to 
be squared: But the parabola can always be squared geometrically by 
methods generally known. 


This remark was the first published allusion to the numerical technique of 
approximating an integral {°f(x) dx by evaluating the integral {?p(x) dx 
of an interpolating polynomial for f(x). 

In a concluding scholium to the Methodus Differentialis (NW IT], 
p. 172), Newton gave the following example. 


If there are four ordinates at equal intervals, let A be the sum of the first 
and fourth, B the sum of the second and third, and R the interval between 
the first and fourth; then...the area between the first and fourth 
ordinates will be $(A +3B)R. 


This is the “Newton-Cotes three-eighths rule,” 
x3 3Ax 
[ fG) dx = =F (ot 371+ 32499): 
XO 


It is obtained by integration of the interpolating polynomial for four 
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equally spaced points, 
x3 3 
J f(x) dx = (Ax) J f(%p+sdx) ds 


ox (Ax) {°[ vot shyy+35(s— 1IA¥+55(5— 1)(s — 2)A¥, | ds 
0 
= (Ax)(3yo + ZA + ZA7Yo + $A* Yo) 
= (Ax)[ 3y9 +3 (1 — Yo) +3072 — 271 + Yo) +33 — 32 + 31 — Yo) | 


3Ax 
= | Oot 3y, + 3y, +93). 


EXERCISE 22. Use the interpolating polynomial for three equally spaced points to 
obtain 


x A 
j f(x) ax = == (yor 491 +2). 
Xo 


This approximation which, together with higher-order approximations, was known 
to Newton’s disciples Roger Cotes and James Stirling, was rediscovered by Thomas 
Simpson in 1743, and is now called “Simpson’s rule.” 


EXERCISE 23. Apply Simpson’s rule and the three-eighths rule to approximate 


1 ax 
v7 = 4 . 
i; 1+ x? 


Taylor’s Series 


The classical Taylor’s series is so named because it was first published by 
Brook Taylor (1685-1731), a disciple of Newton, in his Methodus in- 
crementorum of 1715. An English translation of the pertinent passage 1s 
included in Struik’s source book ({12], pp. 328-333). 

Taylor obtained his series by a limit argument based on the Gregory— 
Newton interpolation formula. In the notation of the previous section, 
rather than Taylor’s own rather cumbersome notation, his derivation may 
be described as follows. If x =x )+nAx, then the interpolation formula 
gives y = f(x) = f(x) + nAx) as 


n(n— 1 n(n—1)(n-—2 
y = yo + ny, + MEAD ny, + MAD ny, + sss. (5) 


In essence, Taylor wants to take the limit as Ax->0 and n->0o, while x, 
and x remain fixed. If we substitute 

bee D tee Xx—-X 

n-l= n-2= 


2 
Ax ’ Ax ’ Ax ° 


ji= 


etc., 
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then (35) becomes 


(x= x(x— x) Be 
2! (Ax)* 


(x — Xo)(x — x))(x— x2) AY 


In order to evaluate the limit, Taylor proposes io “substitute for evanes- 
cent increments the fluxions proportional to them” ({12], p. 332, Corollary 
II). 

He does this by thinking of x and » as functions of t, with x increasing 
uniformly (linearly) with x(0) = x9, x(h) = x,, x(2h) =x, etc. If A is very 
small, then 


Ay 
y = yot (x- x) + 


(36) 


Ax = xX; — Xq = x(h) — x(0) = Xoh, 
where xX, = x(0) is the fluxion (time derivative) of x when ¢ =0. Similarly, 
AY; = Vie1 — Yj = VA +h) — yih) = yh 
so Ayp = Yoh where ¥, = y(0) is the fluxion of y when ¢=0. Then 
Ay = Ay, — Avy = y(h)h — y(O)h & Yoh’, 
A¥y = AX, — AVo = ¥(A)h? — H(O)h? = Yoh’, 
and so forth. It follows that , 


Ayo ~ Yo AYo 4 Yo AYo 4 Jo etc 
Ax %o° (Ax)? (Xp) (Ax) (4)? 
It therefore appears that the limiting form of (36) 1s 
Yo (%—%) Ho, (XX) I 
y Sion ea) to a ee erat (37) 
*o (Xo) ; (Xo) 


because the points x; all approach x, as Ax-0. 
Formula (37) is Taylor’s original series. Interpreting the fluxional ratios 
as derivatives, we obtain the standard modern form 


Fl) =F 0) +S GNx= 29) +P = 20) 4 SEH 


(38) 


of Taylor’s series. Taylor’s rather audacious leap across the logical gap 
between (36) and the equivalent of (38) can be partially justified along the 
lines of the following exercise. 


EXERCISE 24. Substitute Ayo =f (x9 + Ax) — f(%o), A¥o =f (Xo + 2AX) — 2f (Xo + AX) 
+ f (Xo), Ao =f (xp + 3Ax) — 3f (xp + 2Ax) + 3f (xp + Ax)—f (xo), and then apply 
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l’Hospital’s rule to prove that 
40 


lim 5 f'(%), dim 
Aax- 0 A Ax 


o = f"(x9) and Jim 


A att 
0 Ax) Oo (Ax . = f ( XQ). 


Taylor’s series was “in the air” when Taylor published it and, indeed, 
had a certain history that he may not have been aware of. Almost a 
half-century earlier, in a letter to John Collins dated 15 February 1671 
(INC I], pp. 61-65), James Gregory had listed the first 5 or 6 terms of the 
power series expansions of the functions 


'x, logsecx, sec '(V2 e*), 


a) 2 tan '(tanh= J. 


tan x, tan 


log tan( > 5 a 
Although he did not include his derivations of these series, there is some 
indication in Gregory’s unpublished papers that he had calculated the 
derivatives necessary to obtain these series by successive differentiation 
([4], pp. 149-150). It remains unclear whether Gregory had a general 
formula for power series expansions, but evidently he could somehow 
obtain the power series of (almost) any particular function. If Gregory’s 
isolation and premature death had not prevented the full development and 
publication of his research, we might well speak today of the calculus of 
Newton, Leibniz, and Gregory. 

The earliest known explicit statement of the general Taylor’s series was 
given by Newton in a 1691-1692 draft of the De Quadratura. It was, 
however, omitted from the version of this paper that eventually appeared 
in 1704 as an appendix to Newton’s Opticks. Corollary 3 of the draft (see 
[NP VII], pp. 97-99) reads as follows. 


Hence, indeed, if the series [for y in terms of z] proves to be of this form 
y =aztbz*+c22 + dz*+ez°+--- (39) 
(where any of the terms az, bz’, cz’, dz*,..., can either be lacking or be 
negative), the fluxions of y, when z vanishes, are had by setting y/z=a, 
9/2? =2b, y/2? =6c, ¥/z4 = 24d, ¥/z°=120e,... . 
Substituting the value y, (which Newton takes as 0) of y when z=0, it 
follows that 


y = y+ 722 pee eee = 
: 2 


This is the case x, = 0 of (37). Newton’s statement of the general case is his 
following Corollary 4. 

No formal proof is included, but the context indicates that Newton 
undoubtedly obtained the listed values of the coefficients a, b,c, 


290 The Age of Euler 


d,e,..., by successive differentiations. Thus, fluxional differentiation of 
(39) gives 
y = at +2b2z +3cz72 +4dz7z+---, 


so substitution of z=0 yields a=y,/Zpo. If it is assumed, as Newton 
indicates just prior to his statement of Corollary 3, that “z flows uni- 
formly” so z = Z, is constant, then another differentiation gives 


y = 2b2? + 6czz? + 1227727 + ---, 


so substitution of z=0 gives 2b = ji, /z2. 


EXERCISE 25. Differentiate once more to obtain 6c = y/Z@. 


In the Acta Eruditorum of 1694 John Bernoulli published a series that 
was sufficiently similar to Taylor’s for Bernoulli to accuse Taylor of 
plagiarism when the Methodus incrementorum appeared twenty years later. 
Whereas Bernoulli’s series is sometimes presented as a result of successive 
integration by parts, he started by writing 


2 72 2 
n dz = n de + (z dn—2Fde) — (3 gn : oa) 


d Nae Og 
z> dn z> den 
+ (5 Sa-F Ga|- 
dn  z* d’n 
=(ndz+z an) ~ (22 +5; 3) dz 
z’ dn 2 d’n Ms | ane ae 
N22 ol get 
z* dn z> dn 
n az = a(n) ~ a 5, 2) + o 3, <3) - Se 8, 


Termwise integration then gives Bernoulli’s series 


+See OR... (40) 


finds=nz-2@ z dn z* d’n 
3 7 2! dz 3! a2 4! gz 


EXERCISE 26. Derive Bernoulli’s series by successive integration by parts, starting 
with 
dn z* dn z* d’n 
[nde = nz ~ [zeae = NZ — (z¢- 5 as), 


EXERCISE 27. Derive Taylor’s series from Bernoulli’s series as follows. Substitute 
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n= f(z), n=f"(z), n=f'(z), n=fO(z) in turn into (40) to obtain the formulas 


f(2)- $0) = f(z -L@24+£ Op LO ay 


Zz Diz 
f(2)-fO =f"@z- pre) 2%, 4 


f'(z) — f"O) = f'"(z)z — fe Pe 
f(z) — f’°"(0) = fO(z)z2— +++ 


Then eliminate f’(z), f’(z), f’’(z) successively to obtain 


f(z) =fO)+f(O0)z +L ) 734270) 2+, bide 


The case of x) =0 of Taylor’s series, 


A(x) = 0) + FO x+ LO a4. +LO ye, 


is often called Maclaurin’s series. Colin Maclaurin (1698-1746), perhaps 
the most successful of Newton’s disciples, employed Taylor’s series as a 
fundamental tool in his Treatise of Fluxions (1742). The sections in which 
he introduced Taylor’s series by Newton’s method of successive dif- 
ferentiation, and then used it to derive sufficient conditions for the 
existence of local maxima and minima, are included in Struik’s source 
book ({12], pp. 338-341). 
If f’(0) =0, then Taylor’s series gives 


f(x) = 4) +E 24+ LOO oy... 


Maclaurin writes this in fluxional rather than derivative notation. Assum- 
ing that the terms of degree greater than two are negligible when x is 
sufficiently small, Maclaurin concludes that f(x) >/(0) on both sides of 
x=0 if f”(0)>0, while f(x)<f(@) on both sides if f”(0)<0. Thus the 
conditions f’(0) = 0, f”(0) >0 imply a local minimum, while the conditions 
F’'(0) =0, f”(0) < O imply a local maximum. If f’(0) = f’(0) =0 but f’’"(0)+ 
0, then 


f(x) = f(0) +50 2 x? Peas i x4 aaa 


Hence it appears that, if x is sufficiently small, then f(x) >f (0) on one side 
of x =0 and f(x)</(O) on the other side. Thus, if the third derivative is 


the first non-vanishing one, then neither a maximum nor a minimum 


occurs. 
In general, if the first n derivatives vanish at x =0, then 


i *D(0) ntl 
J Y= Oe aye +1)! = ae 
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Maclaurin concludes from this that 


If the first fluxion of the ordinate, with its fluxions of several subsequent 
orders, vanish, the ordinate is a minimum or maximum, when the number 
of all those fluxions that vanish is 1, 3,5, or any odd number. The 
ordinate is a minimum, when the fluxion next to those that vanish is 
positive; but a maximum when this fluxion is negative. ... But if the 
number of all the fluxions of the first and successive orders that vanish be 
an even number, the ordinate is then neither a maximum nor minimum. 


In his Institutiones calculi differentialis ({6], pp. 256-258) Euler takes a 
characteristically carefree approach to Taylor’s formula. Given two values 
X9 and x of the independent variable, write w= x — Xo, and let 

dx =, SO X = X 9+ N dx, 
where N is infinitely large (so dx is infinitely small). Then Euler writes the 
interpolation series for f(x)= f(x) + N dx) as 


f(x) =yt+tNa+——— NC 1) d*y + AW oe ee 
(41) 
with the Leibnizian differential dy serving as an infinitesimal version of 
the Newtonian difference Ay). But N=N—1=N—2=--- because N is 


infinitely large, so (41) simplifies to 


2 3 
f(x) =yt Nats -dy+s dy + ve 
Substitution of N = w/dx =(x — x,)/dx then yields Taylor’s series 


aaa _x) g¢ 
f(x) = y+ (xx BS SOOO 


in which Euler takes the quotient d*y /(dx)* of higher-order differentials to 
be the Ath order derivative f“(x,). 


Fundamental Concepts in the Eighteenth Century 


The calculus entered the eighteenth century encumbered with glaring 
uncertainties regarding the logical foundations of the subject. However, as 
we have seen in this chapter, these uncertainties, which persisted 
throughout the century, did little or nothing to impede a rapid develop- 
ment of the now-standard computational tools of differential calculus. 
During this period integration was generally regarded simply as the inverse 
of differentiation, so integral calculus also was treated as a formal 
manipulative subject. 
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A vigorous exposition of the inconsistencies of the early calculus was 
presented in a 1734 essay by George Berkeley (1685-1753) entitled The 
Analyst, or A Discourse Addressed to an Infidel Mathematician. The “infidel 
mathematician” was Edmund Halley (1656-1742), the astronomer and 
disciple of Newton. The stated purpose of Berkeley, then Anglican 
Bishop-elect of Cloyne (Ireland), was to question whether the foundations 
of mathematics are any firmer than those of religion—“He who can digest 
a second or third fluxion, a second or third difference, need not, methinks, 
be squeamish about any point in divinity.” 


The full text (in English) of The Analyst is readily available in Berkeley’s 
collected works [1], and selected passages are included in Struik’s source 
book ({12], pp. 333-338). 

Bishop Berkeley finds the followers of Newton and Leibniz guilty of 
using methods that they do not understand, basing even the derivation of 
valid conclusions on logical inconsistencies and ambiguous concepts. 


And, forasmuch as it may perhaps seem an unaccountable paradox that 
mathematicians should deduce true propositions from false principles, be 
right in the conclusion and yet err in the premises; I shall endeavor 
particularly to explain why this may come to pass, and shew how error 
may bring forth truth, though it cannot bring forth science ({1], 
pp. 76-77). 


The Method of Fluxions is the general key by help whereof the modern 
mathematicians unkock the secrets of Geometry, and consequently of 
Nature. ... 


And whereas quantities generated in equal times are greater or lesser 
according to the greater or lesser velocity wherewith they increase and are 
generated, a method hath been found to determine quantities from the 
velocities of their generating motions. And such velocities are called 
fluxions: and the quantities generated are called flowing quantities. These 
fluxions are said to be nearly as the increments of the flowing quantities, 
generated in the least equal particles of time; and to be accurately in the 
first proportion of the nascent, or in the last of the evanescent increments. 
...and of the aforesaid fluxions there be other fluxions, which fluxions 
of fluxions are called second fluxions. And the fluxions of these second 
fluxions are called third fluxions: and so on, fourth, fifth, sixth, &c. ad 
infinitum. ... The further the mind analyseth and pursueth these fugitive 
ideas the more it is lost and bewildered; the objects, at first fleeting and 
minute, soon vanishing out of sight ([1], pp. 66-67). 


The foreign mathematicians are supposed by some, even of our own, to 
proceed in a manner less accurate, perhaps, and geometrical, yet more 
intelligible. Instead of flowing quantities and their fluxions, they consider 
the variable finite quantities as increasing or diminishing by the continual 
addition or subduction of infinitely small quantities. Instead of the 
velocities wherewith increments are generated, they consider the incre- 
ments or decrements themselves, which they call differences, and which 
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are supposed to be infinitely small. The difference of a line is an infinitely 
little line; of a plain an infinitely little plain. They suppose finite quanti- 
ties to consist of parts infinitely little, and curves to be polygons, whereof 
the sides are infinitely little, which by the angles they make one with 
another determine the curvity of the line. Now to conceive a quantity 
infinitely small, that is, infinitely less than any sensible or imaginable 
quantity, or than any the least finite magnitude is, I confess, above my 
capacity. ... And yet in the calculus differentialis, which method serves to 
all the same intents and ends with that of fluxions, our modern analysts 
are not content to consider only the differences of finite quantities: they 
also consider the differences of those differences, and the differences of 
the differences of the first differences. And so on ad infinitum. That is, 
they consider quantities infinitely less than the least discernible quantity; 
and others infinitely less than those infinitely small ones; and still others 
infinitely less than the preceding infinitesimals, and so on without end or 
limit. ({1], pp. 67-68) 


In his most memorable passage, Berkeley answered the claim that the 
difficulties associated with ratios of fluxions or differentials could be 
circumvented by replacing these “ultimate ratios of evanescent quantities” 
with proportional ratios of finite line segments. 


It must, indeed, be acknowledged that [Newton] used fluxions, like the 
scaffold of a building, as things to be laid aside or got rid of as soon as 
finite lines were found proportional to them. But then these finite expo- 
nents are found by the help of fluxions. Whatever therefore is got by such 
exponents and proportions is to be ascribed to fluxions: which must 
therefore be previously understood. And what are these fluxions? The 
velocities of evanescent increments? And what are these same evanescent 
increments? They are neither finite quantities, nor quantities infinitely 
small, nor yet nothing. May we not call them the ghosts of departed 
quantities? ([1], pp. 88-89) 


Besides pointing out (with some accuracy) the lack of clarity in contem- 
porary conceptions of fluxions and differentials, Berkeley argued that the 
basic computations of the calculus invariably involved conflicting supposi- 
tions, and therefore could arrive at valid results only through compensa- 
tion of errors. For example, he criticized the extraction of the derivative 
nx"—' of x” from the increment 


(x +0)" — x" = nx"~'o + 5n(n—-1)x"~707 + + -- 


by first dividing by 0, supposing that o is non-zero, and then setting o 
equal to zero. 


Hitherto I have supposed that x flows, that x hath a real increment, that o 
is Something. And IJ have proceeded all along on that supposition, without 
which I should not have been able to have made so much as one single 
step. From that supposition it is that I get at the increment of x”, that I 


Fundamental Concepts in the Eighteenth Century 295 


am able to compare it with the increment of x, and that I find the 
proportion between the two increments. I now beg leave to make a new 
supposition contrary to the first, i.e. I will suppose that there is no 
increment of x, or that o is nothing; which second supposition destroys 
my first, and is inconsistent with it, and therefore with every thing that 
supposeth it. I do nevertheless beg leave to retain nx”"~', which is an 
expression obtained in virtue of my first supposition, which necessarily 
presupposeth such supposition, and which could not be obtained without 
it: All which seems a most inconsistent way of arguing, and such as would 
not be allowed of in Divinity ({1], p. 73). 


Berkeley’s polemic hit close enough to home to inspire a number of 
spirited rejoinders, most of which proved only that their authors hardly 
understood Berkeley, much less the calculus (an exception being 
Maclaurin’s profound Treatise of Fluxions, which may have been moti- 
vated in part by the Berkeley controversy). 

The first step towards resolving Berkeley’s difficulties by explicitly 
defining the derivative as a limit of a quotient of increments, in the manner 
suggested but not stated with sufficient clarity by Newton, was taken by 
Jean d’Alembert (1717-1783). In the article entitled “Différentiel” in vol. 4 
(1754) of the Encyclopédie published by the French Academy, d’Alembert 
wrote 


_ Leibniz was embarassed by the objections he felt to exist against infinitely 
small quantities, as they appear in the differential calculus; thus he 
preferred to reduce infinitely small to merely incomparable quantities. 
... Newton started out from another principle; and one can say that the 
metaphysics of this great mathematician on the calculus of fluxions is 
very exact and illuminating, even though he allowed us only an imperfect 
glimpse of his thoughts. ... He never considered the differential calculus 
as the study of infinitely small quantities, but as the method of first and 
ultimate ratios, that is to say, the method of finding the limits of ratios. 
Thus this famous author has never differentiated quantities but only 
equations; in fact, every equation involves a relation between two vari- 
ables and the differentiation of equations consists merely in finding the 
limit of the ratio of the finite differences of the two quantities contained 
in the equation. ... Once this is well understood, one will feel that the 
assumption made concerning infinitely small quantities serves only to 
abbreviate and simplify the reasoning; but that the differential calculus 
does not necessarily suppose the existence of these quantities; and that 
moreover this calculus consists in algebraically determining the limit of a 
ratio, for which we already have the expression in terms of lines, and in 
equating those two expressions. ...We have seen above that in the 
differential calculus there are really no infinitely small quantities of the 
first order; that actually those quantities [the differentials] are supposed to 
be divided by other supposedly infinitely small quantities; in this state 
they do not denote either infinitely small quantities or quotients of 
infinitely small quantities; they are the limits of the ratio of two finite 
quantities ({12], pp. 341-345). 
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Thus d’Alembert presented in this article a view of the derivative that 
would now be expressed by simply writing 


ww _;,, Av 
5 aay eat Oe 


Although he did not describe the limit concept itself with the precision that 
would come in the nineteenth century, it was a noteworthy step for him to 
clearly identify the derivative as a limit of a ratio of increments, rather 
than a ratio of either differentials or fluxions. However, this insight failed 
to immediately affect basic expositions of the calculus. Although the 
derivative as a limit of a quotient appeared in occasional late eighteenth 
century discussions of the “metaphysics” of the calculus, most textbooks of 
the time continued to rely mainly on the Leibnizian approach with its 
labyrinth of differentials (see [2], p. 250). 

In his Théorie des Fonctions Analytiques published in 1797, Joseph Louis 
Lagrange (1736-1813), the other great figure (with Euler) of eighteenth 
century mathematics, presented a comprehensive development of the 
calculus that was intended to eliminate all references to differentials, 
infinitesimals, and limit concepts from the subject. A brief extract illustrat- 
ing his approach is included in Struik’s source book ({12], pp. 388-391); 
the references below are to the second edition of 1813 [11]. 

Lagrange’s new approach was based on a power series expansion of a 
given function f(x). If x is replaced by x + i then “by the theory of series”, 
as he says, we obtain 


f(xtd) =f(x)tpitg@’trPt+-:--, (42) 


where the coefficients p,q, r,..., are new functions of x, derived (in 
some way to be determined) from the original function f(x). Ad hoc 
expansions of this sort are available for particular familiar functions and, 
in Chapter I ((11], p. 8-9), Lagrange purports to prove that, except for 
particular values of x, every function can be expanded as in (42), with only 
positive integral powers of i appearing. Today we would say that, if 
f(x + i) is represented by the convergent power series (42) in a neighbor- 
hood of i=0O, then f is analytic at x. In this sense Lagrange’s book 1s 
correctly titled—it is a theory of analytic functions (rather than arbitrary 
ones). As Cauchy soon pointed out, there are simple functions such as 
f(x)=e7'/™ that are not analytic. 


The first coefficient in (42), for which Lagrange introduces the notation 
p(x) = f(x), is called the first derived function of f(x). Of course it will turn 
out to be the derivative of f(x); indeed, this is the historical origin of the 
notation f’(x) for the derivative of f(x), as well as of the term “derivative” 
itself. In order to identify the coefficients in (42), Lagrange replaces i by 
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i+ o, obtaining 
f(xt+ito) =f(x)+p-(ito)t+q: (itoytr: (itoft+--- 
f(xtito) = f(x) + pit q’?t+rieit+sitt+--- 
+ po + 2gio + 3ri70 + 4siPo +--+: (43) 


He then replaces x by x + 0 in (42), obtaining 


f(xt+ito) = f(x+o)+ p(xto)it g(xto)i? +r(xt+o0)P+--- 
=[S(x) +/(w)ot +++] +[ p(x) +p'(x)o+ ++ Ji 
+[q(x)+q'(x)o+--- [ieee 
f(xtito) =f(x)t+pitqg’?t+ret+-- 
tpiot qdiotrivot-:--. (44) 


Comparison of the coefficients in (43) and (44) then gives 
q(x) = p(x) = — 
r(x) = 59x) =a F(x) 


s(x) = a(x) = es etc., 


where f”(x) denotes the first derived function of f’(x), etc. Consequently 
series (42) becomes 


f(x+) = fot fit e+ oMes.., (as) 


that is, Taylor’s series. 


Finally Lagrange remarks ({11], p. 19) that only a little knowledge of the 
diff2rential calculus is necessary to recognize that the derived functions 
I(x), Ff’ (x), f’’ Cx), ..., coincide with the successive derivatives of the 
original function f(x). Actually this verification requires the assumption 
that termwise differentiation of (42) with respect to i is valid. 


EXERCISE 28. Differentiate series (42) termwise and then set i = 0 to verify that f’(x) 
is the first derivative of f(x). 


Lagrange’s attempt to expunge from the calculus all trace of infinitesi- 
mal and limit concepts was inevitably unsuccessful. Nevertheless, his book 
on calculus includes several contributions of lasting significance. For 
example, the remainder term for Taylor’s formula makes its first ap- 
pearance in Chapter VII ({11], p. 67). Lagrange’s derivation is equivalent to 
one that is still seen today in calculus textbooks. 
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He starts by replacing x by x —i 1m (45), obtaining 
2 
f(x) = f(x) + f(x-D+ FH (-D+- +, 
Substitution of xz for i then gives 


f(x) = f(x — xz) + xaf'(x— xz) +E "(x-—xz)+t-°> 


xz" n) xttiz,ntl £35 
+ nl f° A aay (x—xz)+°°> 


f(x) = f(x—xz)+ xaf’(x~ x2) +5 "(x — x2) + sic 


x 


be fal * zs 
aes f(x — xz) + x"t!R(x, z). (46) 


Note that R(x, 0)=0. For a modern derivation of Taylor’s formula with 
remainder, we would define R(x, z) for z*0 by (46), instead of assuming 
(45) to start with. 

Next Lagrange differentiates Equation (46) with respect to z, obtaining 


O = —xf'(x—xz) + xf’(x — xz) — x72f"(x — xz) + x2zf"(x — xz) 


xt 


lin 
er ~ = fr P(x — xz) + x"*!R’(x, z), 


sO pairwise cancellation of all but the last two terms yields 


R'(x, z) = 2 f+ D(x — xz) (47) 


for the partial derivative of R(x, z) with respect to z. Now let M and N be 
the minimum and maximum values, respectively, of f“*?(x — xz) for 
z &[0, 1}. Then (47) gives 


Mz" , Nz” 
* < R’(x,z) < 7 


for z€&[0, 1]. Because R(x, 0)=0, antidifferentiation of this inequality 
yields 


Mz"! Nz"t! 
——"__. < R(x, Fintiiotartoem 
Gen o> Gem 


for z €[0, 1]. Taking z = 1 in particular, we obtain 


At this point Lagrange assumes what is now called the intermediate value 
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theorem to conclude that 


n+1) — xz 
Re 


for some z €[0, 1]. Substitution of this value with z = 1 into Equation (46) 
finally gives 


g 2 fF") .2 fO) 4 LOW) Dn 
f(x) = f(0) + f'(O)x + x? ++ - eg Gar . 


(48) 


where u = x — xz €[0, x]. 

Of course the final term of (48) is the “Lagrange form” of the remainder 
term. Lagrange treats explicitly only the particular cases n =0, 1, 2, indi- 
cating that the general result follows in an analogous manner. He quite 
accurately identifies his Taylor’s formula with remainder as a “new theo- 
rem remarkable for its simplicity and generality.” 

This work of Lagrange was a fitting climax for the eighteenth century 
development of the calculus. It provided a reasonably firm foundation for 
the Taylor series that had typified the central role of infinite series 
expansions in eighteenth century investigations, and at the same time 
pointed up the need for the studies of basic properties of continuous 
functions (e.g., maximum-minimum and intermediate value properties) 
that were soon to follow in the nineteenth century. 
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The Calculus According to 
Cauchy, Riemann, and Weierstrass 


Functions and Continuity at the Turn of the Century 


We have seen that, at the beginning of his Jntroductio, Euler defined a 
function of a variable quantity as “an analytical expression composed in 
any way from this variable quantity and from numbers or constant 
quantities.” Thus a particular function was defined in terms of a specific 
formula or equation; Euler assumed, moreover, that a given function is 
prescribed throughout its “domain of definition” by one and the same 
“analytical expression.” 

This narrowly analytical conception of functions generally prevailed in 
eighteenth century calculus, but was called into question by discussions 
between Euler, d’Alembert, and Daniel Bernoulli (1700-1782, son of John 
Bernoulli) concerning the nature of the “arbitrary functions” that arise in 
the integration of partial differential equations, such as the one that 
represents the motion of a vibrating string. For some extracts from this 
discussion see Struik’s source book ({17], pp. 351-368). An exhaustive 
exposition of the whole matter is given by C. A. Truesdell in his introduc- 
tion to Volume 11 (part 2), Series II of Euler’s collected works [18]. 

In a 1747 paper d’Alembert investigated the motion of a vibrating string 
that is stretched between the points x=0 and x=Z on the x-axis. He 
introduced a condition equivalent to the partial differential equation 


ay —_ pee , (1) 
or? Ox? 
y(x, t) being the (transverse) displacement at time ¢ of the point x on the 


string. (Actually, d’Alembert used the arclength s in place of the variable 
x, and the partial differential notation was supplied by Euler several years 


301 


302 The Calculus According to Cauchy, Riemann, and Weierstrass 


later.) He then observed that (1) is satisfied by any function of the form 


y(x, t) = o(x + at) + W(x — at) (2) 


where ¢ and y are “arbitrary functions” of a single variable. Assuming that 
the string is initially set into motion by first deforming it into the shape 
y = f(x), and then releasing it at time += 0, it follows that its subsequent 
motion is described by 


y(x, t) = 3f(x + at) + f(x — at). (3) 


EXERCISE 1. If ¢ and y are twice differentiable functions of a single variable, show 
that the function y(x, t) defined by (2) satisfies Equation (1). 


EXERCISE 2. Derive the solution (3) by imposing the initial conditions y(x, 0) = f(x) 
and D,y(x, 0)=0 on the function defined by (2). 


Now the disagreement that arose concerned the type of arbitrary func- 
tion y = f(x) that could be assumed to represent the initial shape of the 
string. D’Alembert argued that, in order to apply legitimately the opera- 
tions of the calculus, 1t was necessary that each such function be expressed 
everywhere in terms of one and the same algebraic or transcendental 
equation. This requirement was described at the time by saying that “the 
function is subject to the law of continuity of form.” 

Euler countered that this requirement is physically unrealistic—a string 
may well be plucked in such a way that its initial shape is described by 
different analytical expressions in different intervals. For example, if a 
stretched string is set into motion by displacing its midpoint a unit distance 
before releasing it at time ¢ =0, then its initial shape is described by 


2 tel. EI 
2 (L-x) itxe| 5. L |. 

Euler therefore argued for the admission into mathematical analysis of 
functions that today would be called piecewise-smooth, but which he 
referred to as “mixed” or “irregular and discontinuous,” because they 
corresponded ‘to different “continuous” (i.e. smooth) functions on different 
intervals. He also included as a “discontinuous” function one whose graph 
can be traced with a free motion of the hand, such a function not being 
subject to any “law of continuity” whatever. 

Thus, in the late eighteenth century view of functions that stemmed 
largely from the vibrating string controversy, “continuity” referred to a 
constancy of the analytical expression of a function, rather than to con- 
nectedness of its graph (the modern idea of continuity). Indeed, essentially 
all functions treated in eighteenth century analysis were continuous in the 
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modern sense; “discontinuity” then referred either to failure at isolated 
points (where the analytical expression changed) of a function to be 
smooth (in the modern sense), or to the lack of any analytical expression at 
all (as in the case of freehand curves). 

By contrast, the modern sense of discontinuity is more nearly that of 
discontiguity. This distinction was first made explicit by Louis Arbogast 
(1759-1803). In 1787 the Academy of St. Petersburg offered a prize for the 
best answer to the question: 


Whether the arbitrary functions that one obtains by the integration of an 
equation in three or more variables represent any curves or surfaces 
whatsoever, either algebraic or transcendental, either mechanical, discon- 
tinuous, or produced by a free motion of the hand; or whether these 
functions include only continuous curves represented by an algebraic or 
transcendental equation. 


In his 1791 memoir which won this prize, Arbogast wrote 


The law of continuity consists in that a quantity cannot pass from one 
state to another without passing through all the intermediate states which 
are subject to the same law. Algebraic functions are regarded as continu- 
ous because the different values of these functions depend in the same 
manner on those of the variable; and, supposing that the variable in- 
creases continually, the function will receive corresponding variations; 
but it will not pass from one value to another without also passing 
through all the intermediate values. Thus the ordinate y of an algebraic 
curve, when the abscissa x varies, cannot pass brusquely from one value 
to another; there cannot be a saltus from one ordinate to another which 
differs from it by an assignable quantity; but all the successive values of y 
must be linked together by one and the same law which makes the 
extremities of these ordinates make up a regular and continuous curve. 


Here he singles out the “intermediate value property” that was soon to 
play an important role in calculus. 


This continuity may be destroyed in two manners: 1. The function may 
change its form, that is to say, the law by which the function depends on 
the variable may change all at once. A curve formed by the assemblage of 
many portions of different curves is of this kind... It is not even 
necessary that the function y should be expressed by an equation for a 
certain interval of the variable; it may continually change its form, and 
the line representing it, instead of being an assemblage of regular curves, 
may be such that at each of its points it becomes a different curve; that is 
to say, it may be entirely irregular and not follow any law for any interval 
however small. 

Such would be a curve traced at hazard by the free movement of the 
hand. These kinds of curves can neither be represented by one nor by 
many algebraic or transcendental equations. 
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2. The law of continuity is again broken when the different parts of a 
curve do not join to one another... We will call curves of this kind 
discontiguous curves, because all their parts are not contiguous, and 
similarly for discontiguous functions (cited by Jourdain [14], 
pp. 675-676). 


Arbogast decided that the arbitrary functions that appear in the solu- 
tions of partial differential equations may be neither continuous nor 
contiguous. However, as we will see in the next section, the pivotal anc 
clinching argument for the necessity of considering discontinuous func- 
tions in mathematical analysis was provided by Joseph Fourier 
(1768-1830) in the first decade of the nineteenth century. 


Fourier and Discontinuity 


Fourier’s celebrated book Theorie analytique de la chaleur (The Analytical 
Theory of Heat) was published in 1822, but much that it contains dates 
back to a memoir that he presented to the Paris Academy of Sciences in 
1807. In it he developed into a comprehensive general theory the method 
of trigonometric series that Euler and Bernoulli had applied to isolated 
special cases in their work on the vibrating string a half-century earlier (see 
[17], pp. 360-367). 

A typical problem in the theory of heat of the sort that Fourier considers 
asks for a steady-state temperature function u(x, y) in the region 0 <x <7, 
y > 0 satisfying the conditions 


O*u | Ou 

Bx2 + a = 0, (4) 
u(0, y) = u(z, y) = 0, (5) 
u(x, 0) = $(x), (6) 


where ¢$(x) is a given function prescribing the temperature on the base of 
the region. Fourier observes (e.g., by separation of variables) that each of 
the simple functions 

e ™sin nx, 1 a 90 
satisfies conditions (4) and (5), and concludes by superposition (assuming 
convergence, termwise differentiability, etc.) that the more general function 


oo 
u(x,y) = > be ~™sin nx 


n=] 
does also, b,, 5, b3,..., being arbitrary constants. The final condition (6) 
can then be satisfied as well if these constants can be chosen so that 
[oe] 
o(x) = > 5B, sin nx, x € (0, 7). (7) 


n=] 
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Fourier attacks the “development of an arbitrary function in trigonomet- 
ric series” in Chapter IIT, Section VI of [8]. His first heuristic procedure for 
the evaluation of the constants {b,} may be outlined as follows. He 
assumes that ¢(x) is an odd function with Taylor series 


3 ) 
o(x) = x9'(0) +379") + FHOO + > (8) 


By first substituting the Taylor series for sin nx on the right-hand side of 
(7), and then equating coefficients of like powers of x in (7) and (8), 
Fourier obtains the infinite set of linear equations 


¢'(0) = 6, + 2b, +36,+4b,+--: 
— 9’ (0) = b, + 2°b, + 3°b, + 4b, + --- 
60) = b, + 2°b, + 3°b,+ 4b, + --- 
— (0) = b, + 2'b, + 37b, + 4/bg + °° - 


To solve this system, he first approximates b,,..., b,, by deleting all terms 
in the first m equations with subscripts greater than m (hence m linear 
equations in m unknowns), and finally computes the limits of these 
approximations as m-—>oo. The result of this complicated elimination 
process ([8], pp. 169-183) is 


a, = (-1)""!5 8, 


l ] l 
a, = (7) — 5 o"(a) + 4 O(a) a go) et 2 (9) 


Considering a, as a function of 7, differentiating twice, comparing the 
results, and finally replacing 7 by x, he obtains 


1 da, 
ot gt + a,(x) = $(x). 


The general solution of this second order ordinary differential equation is 
a,(x) = A cos nx + B sin nx 
+ (n sin nx) i, “$(t) cos nt dt — (n cos nx) if “$(t) sin nt dt. 
0 0 
Because A = a,(0) =0 (why?), it finally follows that 
iat 1"*"2 a(n) Z =f erie (10) 
the now-familiar formula for the Fourier coefficients! Only after carrying 


through this technical tour de force does Fourier point out ((8], 
pp. 187-188) that formula (10) can be “verified” by the now standard 
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device of multiplying both sides of equation (7) by sin mx and then 
integrating termwise from 0 to 7, making use of the orthogonality of the 
sine functions on the interval [0, 7]. 


EXERCISE 3. Assuming (7) and the validity of termwise integration, verify (10) in 
this way. The needed “orthogonality” is the fact that 


[sin mx sin nx dx = 0 ifm ~n. 
0 


We see by this that the coefficients a, b,c, d,e, f,..., which enter 
into the equation 


smo(x) = asinx + bsin2x +csin3x+dsin4x+---, 


and which we found formerly by way of successive eliminations, are the 
values of definite integrals expressed by the general term f sin ix $(x) dx, 
i being the number of the term whose coefficient is required. This remark 
is important, because it shows how even entirely arbitrary functions may 
be developed in series of sines of multiple arcs. In fact, if the function 
(x) be represented by the variable ordinate of any curve whatever whose 
abscissa extends from x =0 to x =7, and if on the same part of the axis 
the known trigonometric curve, whose ordinate is y=sin x, be con- 
structed, it is easy to represent the value of any integral term. We must 
suppose that for each abscissa x, to which corresponds one value of (x), 
and one value of sin x, we multiply the latter value by the first, and at the 
same point of the axis raise an ordinate equal to the product ¢$(x) sin x. 
By this continuous operation a third curve is formed, whose ordinates are 
those of the trigonometric curve, reduced in proportion to the ordinates of 
the arbitrary curve which represents ¢(x). This done, the area of the 
reduced curve taken from x=0 to x= gives the exact value of the 
coefficient of sin x; and whatever the given curve may be which corre- 
sponds to $(x), whether we can assign to it an analytical equation, or 
whether it depends on no regular law, it is evident that it always serves to 
reduce in any manner whatever the trigonometric curve; so that the area 
of the reduced curve has, in all possible cases, a definite value which is the 
value of the coefficient of sin x in the development of the function. The 
same is the case with the following coefficient 6, or f (x) sin 2x dx. ([8}, 
p. 186). 


Here Fourier makes the important observation that, to permit the 
calculation of the coefficients in the Fourier series of (x), it suffices for 
the region under y = (x) sin nx to have an area (for each n) that can be 
interpreted as the value of the integral {9¢(x) sin nx dx. It is not necessary 
that $(x) sin nx be continuous and therefore have an integral that can be 
calculated by antidifferentiation. Moreover, Fourier observed that even if 
(x) is continuous on [0, 7], but ¢(7)+0, then the extended function to 
which its Fourier series converges (presumably) on the whole real line will 
necessarily be discontinuous (i.e., discontiguous) at points x that are odd 
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multiples of 77, because this extended function is odd with period 27. This 
is the case with so simple a function as $(x)=x. Consequently, the 
introduction of Fourier series techniques essentially forced the considera- 
tion of discontinuous functions on an equal footing with continuous ones, 
and called for the development of a theory of integration of discontinuous 
functions (soon provided, as we will see, by Cauchy and Riemann). 


It is remarkable that we can express by convergent series, and, as we 
shall see in the sequel, by definite integrals, the ordinates of lines and 
surfaces which are not subject to a continuous law. We see by this that we 
must admit into analysis functions which have equal values, whenever the 
variable receives any values whatever included between two given limits, 
even though on substituting in these two functions, instead of the vari- 
able, a number included in another interval, the results of the two 
substitutions are not the same. The functions which enjoy this property 
are represented by different lines, which coincide in a definite portion 
only of their course, and offer a singular species of finite osculation. 
These considerations arise in the calculus of partial differential equations; 
they throw a new light on this calculus, and serve to facilitate its 
employment in physical theories ([8], p. 199). 


In the final chapter of his book Fourier presented an outline of a proof 
of the convergence of his trigonometric series, and included the following 
quite general formulation of the functional concept. 


Above all, it must be remarked that the function f(x), to which this proof 
applies, is entirely arbitrary, and not subject to a continuous law... . In 
general the function f(x) represents a succession of values being given to 
the abscissa x, there are an equal number of ordinates f(x).... We do 
not suppose these ordinates to be subject to a common law; they succeed 
each other in any manner whatever, and each of them 1s given as if it were 
a single quantity. 

It may follow from the very nature of the problem, and from the 
analysis which is applicable to it, that the passage from one ordinate to 
the following is effected in a continuous manner. But special conditions 
are then concerned, and the general equation, considered by itself, is 
independent of these conditions. It is rigourously applicable to discon- 
tinuous functions ((8], p. 430). 


Although Fourier approaches here the modern concept of a function, his 
working definition of discontinuity in actual practice was that of the 
eighteenth century (discontinuity of analytic form)—his functions (like 
everyone else’s at that time) were at worst piecewise-smooth, with only a 
finite number of “discontiguities” in each finite interval. 

The first example of a “genuinely discontinuous” function, one that 
exhibited the full potential of the concept of a function as an arbitrary 
pairing, was provided by Peter Lejeune-Dirichlet (1805-1859). In an 1829 
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paper Dirichlet formulated sufficient conditions for the convergence of a 
Fourier series, and gave the first complete rigorous proof of such conver- 
gence (see [9], Chapter 5). At the end of this paper he gave an example of a 
function not satisfying the “Dirichlet conditions’: “f(x) equals a de- 
termined constant c when the variable x takes a rational value, and equals 
another constant d when this variable is irrational.”’ This famous function 
is of course discontinuous everywhere. 


Bolzano, Cauchy, and Continuity 


A precise formulation of the modern concept of continuity first appeared 
in a pamphlet published privately in Prague by the isolated Bohemian 
scholar and priest Bernard Bolzano (1781-1848). Its title stated its pur- 
pose: Purely analytical proof of the theorem, that between each two roots 
which guarantee an opposing result {in sign], at least one real root of the 
equation lies. Thus he proposed to give a “purely analytical proof” of the 
intermediate value theorem for continuous functions. 

Bolzano argued that the intuitive geometric proof—a continuous curve 
must somewhere cross any straight line that separates its endpoints—is 
based on an inadequate conception of continuity. To correctly explain the: 
concept of continuity, he said, one must understand that t) e meaning of 
the phrase “A function f(x) varies according to the law of continuity for 
all values of x which lie inside or outside certain limits, is nothing other 
than this: If x 1s any such value, the difference f(x +w)—f(x) can be 
made smaller than any given quantity, if one makes w as small as one 
wishes.” In other words, f(x) is continuous on an interval provided that 
lim,,_.9 f(x + w) = f(x) for each point x of the interval. 

As a crucial lemma, Bolzano asserted that, if M is a property of real 
numbers that does not hold for all x, and there exists a number u such that 
all numbers x <u have property M, then there exists a /Jargest U such that 
all numbers x<U have property M. In his attempted proof by the 
now-familiar bisection method, he produced a “Cauchy sequence” {u,}/° 
intended to converge to the desired U. Although he (and later Cauchy) 
correctly stated what is now called the “Cauchy convergence criterion” 
[{u,} converges if and only if, given e>0, |u,,—u,|<e for m and n 
sufficiently large], he could not (nor could Cauchy) supply a complete 
proof, for lack of a completeness property of the real number system. 


EXERCISE 4. Use Bolzano’s lemma to prove that, if {a,}/° is a bounded, monotone 
increasing sequence, 


Qa, <a, <°+°* La, San, <° °° <A, 


then there is a number U such that lim,,_,,, a4, = U. Hint: Let the number x have 
property M provided that x< a, for some n. 
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Bolzano applied the lemma above to prove the following generalization 
of the theorem of his title: If f(x) and g(x) are continuous functions on the 
interval [a, b] with f(a) <g(a) and f(b) >g(b), then f(x) = g(x) for some 
number x between a and b. 


EXERCISE 5. (Bolzano’s Proof). Let the number 7 have property M if either r < 0 or 
r>Oand f(a+r)<g(atr). If U is the largest number such that all numbers r << U 
have property M, show thata<a+U<b and f(a+ U)=g(at UV). 


Bolzano’s little pamphlet was not widely circulated among contem- 
porary mathematicians, and the extent of its influence is unclear. However, 
in his Cours d’analyse of 1821, Cauchy essentially duplicated Bolzano’s 
definition and immediate applications of continuity. An interesting (if 
perhaps controversial) discussion of the question as to whether Cauchy 
knew of the earlier work of Bolzano is included in an article by Grattan- 
Guinness [10]. 

Augustin-Louis Cauchy (1789-1857) was the dominant mathematical 
figure in a Paris that still regarded itself as the center of the mathematical 
world (despite the fact that Gauss never left Germany). Today Cauchy is 
often credited with the founding of the modern age of rigor in mathemat- 
ics. In this tradition he may be a beneficiary by default on the part of 
Gauss, whose personal standards of rigor were equally high, but whose 
publication policy—“few but ripe’”—was the opposite of Cauchy’s. More- 
over, it may be noted that Cauchy occasionally stumbled conspicuously, as 
in failing to distinguish between continuity and uniform continuity or 
between convergence and uniform convergence. 

Nevertheless, it was Cauchy whose expositions of analysis first stamped 
elementary calculus with the general character that it retains today. Con- 
tinuing the pedagogical tradition of the Ecole Polytechnique (Paris), he 
wrote three great textbooks—the Cours d’ analyse (1821), Resume des lecons 
sur le calcul infinitesimal (1822), and Lecons sur le calcul differentiel (1829) 
—which were the first to set forth the establishment of complete rigor in 
mathematical analysis as a principal goal. These books are reprinted in 
volumes 3 and 4 (Series 2) of Cauchy’s collected works, and English 
translations of substantial portions of them are provided by R. Iacobacci 


[12], p. 188) Cauchy wrote: 


... The methods which I have followed differ in many respects from 
those which are expounded in other works of the same type. My principal 
aim has been to reconcile rigor, which I have made a law to myself in my 
Cours d’analyse, with the simplicity which the direct consideration of 
infinitely small quantities produces. For this reason, I believed it to be my 
duty to reject the development of functions into infinite series each time 
that the series obtained is not convergent... In the integral calculus, it 
has appeared to me necessary to demonstrate generally the existence of 
the integrals or primitive functions before making known their diverse 
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properties. In order to attain this object, it was found necessary to 
establish at the outset the notion of integrals taken between given limits or 
definite integrals. 


Cauchy’s was the first comprehensive treatment of mathematical analy- 
sis to be based from the outset on a reasonably clear definition of the limit 
concept: 


When the successive values attributed to a variable approach indefinitely 
a fixed value so as to end by differing from it by as little as one wishes, 
this last [fixed value] is called the /imit of all the others. Thus, for 
example, an irrational number is the limit of diverse fractions which 
furnish more and more approximate values of it ((3], p. 19; [12], p. 191). 


The device that enabled him to “reconcile rigor with infinitesimals” was a 
new definition of infinitesimals that avoided the infinitely small fixed 
numbers of earlier mathematicians. Cauchy defined an infinitesimal (“un 
infiniment petit”) or infinitely small quantity (“quantite infiniment petite’’) to 
be simply a variable with zero as its limit ([3], p. 19). Again, 


One says that a variable quantity becomes infinitely small when its 
numerical value decreases indefinitely in such a way as to converge 
towards the value zero ({3], p. 37; [12], p. 194). 


Let a be an infinitesimal (“une quantite infiniment petite”), that is to say 
a variable whose numerical value decreases indefinitely ({3], p. 38; [12], 
p. 196). 


Although Cauchy complicates his exposition by discussing infinitely 
small quantities in the language of variables rather than that of functions, 
it is clear that by the phrases “un infiniment petit” and “une quantite 
infiniment petite’—both of which are frequently rendered into English as 
“an infinitesimal” (e.g., by lacobacci)—-he means a dependent variable or 
function a(h) that approaches zero as h->0. In particular, his “infinitesi- 
mals” are no longer the infinitely small fixed numbers that earlier had been 
the source of so much confusion and controversy. 

In Chapter II of the Cours d’ analyse Cauchy introduces the concept of 
continuity for a function defined on an interval, with essentially the same 
definition that Bolzano had given. If, given a value of x within the interval 
where the function f is defined, 


one assigns to the variable x an infinitely small increment a, the function 
itself will take on for an increment the difference 


f(x +a) — f(x), 
which will depend at the same time on the new variable a and on the 


value of x. This granted, the function f(x) will be, between the two limits 
assigned to the variable x, a continuous function of the variable if, for 
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each value of x intermediate between these limits, the numerical value of 
the difference 

f(x +a) — f(x) 
decreases indefinitely with that of a. In other words, the function f(x) will 
remain continuous with respect to x between the given limits, if, between 
these limits, an infinitely small increment of the variable always produces an 
infinitely small increment of the function itself ({3], p. 43; [12], p. 201). 


Note that the final italicized statement does not refer to fixed infinitesi- 
mals; it says simply that the variable f(x + a) — f(x) is an infinitely small 
quantity (as previously defined) whenever the variable a is, that is, that 
f(x + a) — f(x) approaches zero as a does. Thus Cauchy uses the statement 
that “f(x + a) — f(x) is infinitely small when a is” in the same way that 
one often does today—as a convenient abbreviation for a more com- 
plicated statement involving limits. 

He points out that the continuity of the familiar elementary functions is 
easily verified (on intervals containing no singular points corresponding to 
zero denominators). For example, the function sin x is continuous on every 
interval because “the numerical value of sin (5 a), and consequently that of 
the difference 


sin(x +a) — sin x = 2 sin(ja) cos( x +ia) 


decreases indefinitely with that of a’ ((3], p. 44). 


EXERCISE 6. Show directly that e*~ is continuous everywhere. What must you 
assume? 


Cauchy next discusses the continuity of a composition of continuous 
functions. However, he errs in his attempted proof that a continuous 
function of several continuous functions is continuous, erroneously think- 
ing that he has proved that f(x, y, z) is continuous, that is, 

lim f(xta,y+B,z+y) = f(x,y, Z) 
a, 8B, y-0 
provided that f is continuous in each of the independent variables x, y, z 
separately (see Theorem I on page 47 of [3]). 

Cauchy states the intermediate value theorem as Theorem IV on page 
50: If f(x) is continuous on the interval [x), X], and 5 is a number between 
(Xo) and f(X), then there exists at least one point x of the interval such 
that f(x) = 5b. He provides an intuitive geometric proof, but in a note on 
the numerical solution of equations ((3], pp. 378-425) includes an alterna- 
tive proof by “une methode directe et purement analytique” (shades of 
Bolzano?). 

Taking 6 =0 and m an integer larger than one, he first subdivides [x,, X] 
into m equal subintervals. Since f(x) changes sign on [x,, X] it must 
change sign on some subinterval [x,, X,]. Next [x,, X,] is subdivided into 
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m equal subintervals, on one of which, say [x,, X,], f(x) again changes 
sign. Continuing in this way Cauchy constructs an increasing sequence 
{x,}¢ of points of [x9, X] such that each value f(x,) has the same sign as 
J (Xo), and a decreasing sequence {X,}/° such that each f(X,) has the same 
sign as f(X). Because X,, — x, =(X — X9)/m"-0 as n-00, he concludes 
that these two sequences converge to a common limit point a €(xp, X). By 
continuity, f(x,) >0 (say) for each » implies f(a)=lim f(x,) +0, while 
f(X,) <0 implies f(a) <0. It therefore follows that f(a)=0, as desired. 
This proof of the intermediate value theorem is the one that is perhaps 
most frequently found in modern textbooks. 


In Chapter VI of his Cours d’ analyse Cauchy presents the first systematic 
study of convergence of infinite series, including statements and proofs of 
the ratio and root tests. In Theorem I on page 120 he makes his famous 
incorrect assertion that the sum of a convergent infinite series of continu- 
ous functions is itself a continuous function. Under Problem I on page 146 
he makes an incomplete but quite interesting attempt to establish Newton’s 
binomial series 


" ae Pn Me Wen?) 55 4 Lee AL) 


First he points out that (by the ratio test) this infinite series converges if 
|x| <1. With x fixed, he denotes its sum by $(), and concludes (by his 
incorrect assertion) that @ is a continuous function of p. Direct computa- 
tion (by the “Cauchy product” of two infinite series) then shows that 


o( wou’) = o( n+p’). (12) 
But he has shown in Problem 2 of Chapter V that this functional equation 
for a continuous function implies that 


o(n) =[o(1)]*, 


that is, ¢( “) is an exponential function of yw. But ¢(1) = 1+ x, so this means 
that o(u)=(1+x)* for |x|<1 and p arbitrary, as desired. The first 
complete verification of the binomial series was given by Abel in 1826. 


(i+x)* = 1+5x 


EXERCISE 7. (a) Assuming (12) for all uw and p’, show that ¢(m)=[¢(1)]” and 
¢(1/m) =[¢(1)]'/” if m is a positive integer. 

(b) Deduce that ¢(m/n)=[¢(1)]"/" if m and n are integers. 

(c) Conclude by continuity that ( ») =[¢(1)]* if p is irrational. 


Cauchy’s Differential Calculus 


Previous expositions of the calculus (with the exception of Lagrange and 
his attempted calculus as algebra without limits) had generally taken the 
differential in some form as the fundamental concept. The derivative of 
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y = f(x) was then introduced as the “differential coefficient” in the expres- 
sion dy = f’(x) dx. Cauchy, by contrast, took as his starting point a clearcut 
definition of the derivative as the limit of a difference quotient: 


When a function y = f(x) remains continuous between two given limits 
of the variable x, and when one assigns to such a variable a value 
enclosed between the two limits at issue, then an infinitely small incre- 
ment assigned to the variable produces an infinitely small increment in 
the function itself. Consequently, if one puts Ax =i, the two terms of the 
ratio of differences 
Ay _ f(e+i-S() 

i 


Ax 


will be infinitely small quantities. But though these two terms will ap- 
proach the limit zero indefinitely and simultaneously, the ratio itself can 
converge towards another limit, be it positive or be it negative. This limit, 
when it exists, has a definite value for each particular value of x; but it 
varies with x ... The form of the new function which serves as the limit 
of the ratio [f(x + i) — f(x)]|/i will depend on the form of the proposed 
function y = f(x). In order to indicate this dependence, one gives the new 
function the name of derived function, and designates it with the aid of an 
accent by the notation, y’ or f’(x) ((4], pp. 22-23; [12], p. 240). 


After some typical computations of derivatives of elementary functions, 
Cauchy introduces what is now called the chain rule for computing the 
derivative of the composition of two functions ((4], p. 25; [12], p. 243): 


Now let z be a second function of x, bound to the first y = f(x) by the 
formula 


z= F()). 
z or F[f(x)] will be that which one calls a function of a function of a 
variable x; and, if one designates the infinitely small and simultaneous 
increments of x, y, and z by Ax, Ay, Az, one will find 


—— a a er rs e metas 


then, on passing to the limits, 
z= y Fy) = f(x) FITS (x). 


Here (as sometimes happens in elementary calculus texts today) the 
possibility that Ay = f(x + Ax) — f(x) =0 for small non-zero values of Ax is 
overlooked. 

His explicit formulation of continuity and differentiability in terms of 
limits was one of three features of Cauchy’s calculus that set the pattern 
for subsequent expositions of the subject. The second was the central role 
he accorded to the mean value theorem (which was known previously to 
Lagrange, but not extensively used by him). The third—his definition of 
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the integral and proof of the fundamental theorem of calculus—will be 
discussed in the following section. 

In modern textbooks the mean value theorem—if f is differentiable on 
[a,b] then f(5)—f(a)=f(§(6—a) for some §€(a, b)—is generally 
proved by applying Rolle’s theorem to a suitably contrived auxiliary 
function; this approach apparently was first discovered by Ossian Bonnet 
(1819-1892). The fact, that a positive (negative) derivative on an interval 
corresponds to an increasing (decreasing) function on that interval, is then 
deduced as an immediate corollary to the mean value theorem. 

By contrast, in the fourth of his Lecons sur le calcul differentiel, Cauchy 
begins his approach to the mean value theorem by first investigating the 
significance of the sign of the derivative. Because 


dy Ay 


ae Te 


he observes that, if y’>0 at x9, then Ay and Ax must have the same sign 
for Ax sufficiently small (and different signs if y’<0). Hence y =f(x) 
increases as x increases through x,. Therefore, he says, if one increases x 
“by insensible degrees” from x =x, to x =X, the function f(x) will be 
increasing at all times that its derivative is positive, and decreasing when it 
is negative ([5], p. 308, Corollary I). In particular, f(X) >f (x9) if f’(x) >0 
on [Xo, X J. 

With this preparation, Cauchy is ready for his “generalized mean value 
theorem”: Let f(x) and F(x) have continuous derivatives on the interval 
[x , X], and suppose F'’(x) is non-zero on this interval. Then 


HX)-f(%o) _ f'®@ 
F(X)— F(x) F'® 


for some point E(x, X). (See Theorem II and its first corollary ({5], 
pp. 308-310)). 

Cauchy’s proof is more clearly motivated than the auxiliary function 
proof (which, however, does not require continuity of the derivatives). 
Consider the case F’(x)>0, and assume without loss of generality that 
f (Xo) = F(x.) =0. Let A and B be the minimum and maximum values, 
respectively, of the quotient f’(x)/F’(x) on the interval [x9, X] (it may be 
noted that Cauchy overlooks the need to prove that a continuous function 
on a closed interval attains minimum and maximum values). Then 


(13) 


f(x) — AF(x) 2 0 and f(x) — BF’(x) < 0. 


Thus the derivative of f(x) — AF(x) is non-negative on [x,, X], while that 
of f(x) — BF(x) is non-positive. Hence the first of these latter functions is 
non-decreasing on the interval, while the second is non-increasing. Since 
both vanish at xo, it follows that 


f(X) — AF(X) > 0 and f(X)— BF(X) <0, 
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and hence that (x) 
F(X 
A<—— <B. 
F(X) 
An application of the intermediate value theorem to the quotient 
f'(x)/ F(x) then gives a point €€[x9, X] such that Equation (13) holds 
(remembering that f(x,) = F(x) = 9). 


EXERCISE 8. Obtain the ordinary mean value theorem by substituting F(x)=x in 
Cauchy’s generalized mean value theorem. 


With X = x) +h and f(x9) = F(xo) =0, Equation (13) becomes 


f (xo + h) = f'(Xo + Oh) 
F(xpth)  F’(xo + 9h) 
for some 8 € (0, 1). If also f’(x 9) = F’(xp) =0, then a second application of 
the generalized mean value theorem gives 
f'(Xo + 9h) ~ f" (xo + 8,h) 
F'(xot+ Oh) iF" (xy + 9h) 


for some 6, €(0, 1). Continuing in this way, after n applications of the 
generalized mean value theorem Cauchy obtains the result that 


f(xot A) 7” f (xo + Oh) 


F(x9 + h) - FO (x9 + 6h) (14) 


for some @ €(0, 1), under the assumptions that f and F and their first n — 1 
derivatives vanish at xp, the first n derivatives of f are continuous between 
Xo and xg+h, and that the first n derivatives of F are continuous and 
non-zero between x, and x) + A ({5], p. 310, Corollary I). In his fifth lecon 
he takes the limit in (14) as A->0 to obtain l’Hospital’s rule 


on £0) 2 in £POR) 
im F( x) Ce F(x) 


for higher order 0/0 indeterminate forms (with the first n derivatives of f 
and F continuous in a neighborhood of x9, and the first »—1 of them 
having Xp as an isolated zero). 

In his seventh /egon Cauchy applies (14) to rigorously establish the 
higher derivative test for local maxima and minima that had been known 
to Euler and Maclaurin in the eighteenth century. Suppose that f and its 
first n—1 derivatives vanish at x. If F(x) =(x — x,)”, then F and its first 
n—1 derivatives also vanish at x,, while F“(x)=n!. Therefore Equation 
(14) yields 


flxo+ h) = ALF (xo Oh) (15) 


for some @ € (0, 1). 
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EXERCISE 9. Deduce from (15) that f has (a) a local maximum at Xp if n is even and 
f (xo) <0; (b) a local minimum at xq if n is even and f “(x 9) > 0; (c) neither if n 
is odd and f (x9) <0. 


In his eighth /egon Cauchy gives what is in some ways still the most 
appealing derivation of Taylor’s formula with remainder. If f(x) is a 
function whose first m derivatives are continuous on [X,, x) +] then the 
function 


F(x) = f(x) ~ $2) — F'(%o)(~ x0) — EO (x — 24)? 
f° (xo) n—1 
hot Gs se ee Ti ca 


vanishes together with its first n — 1 derivatives at x9, so 
hn” ae 
F(xypt+h) = eae (x9 + Oh) 
by Equation (15). But obviously F(x) =f (x), so this yields 


PODS gene Ce De, 


f(x) = f(%o) + f'(xo)A + ++ 5 ry aC (n—1)! n! 


With x) =a and h=x—a this becomes 


f(x) = fa) + f(ayx~a) 
iO 
“= 


that is, Taylor’s formula with the Lagrange form of the remainder. 

On pages 360-361 of the Jecons we find the “Cauchy form” of the 
remainder. Regarding x as a constant and a as a variable, Cauchy defines 
the function (a) by the equation 


+ -+-- 


™a x-—a 
Fa) o  gyat g Lat x= a) 


n! 


a)”, 


(2-Dq ag 
fx) = fla) + fax) + +O (x ay! + 0). 
(16) 
Then ¢(x)=0, and a simple computation yields 
™(o - 
(a) = — EE xa (17) 


Application of the (ordinary) mean value theorem to the function (a) on 
the interval with end points a and x then gives 


$(a) = o(x) — (x—a)o'(a + A(x — a)) 


(a) = LAER 504 (2-0), 
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the Cauchy form of the remainder. In succeeding lessons Cauchy applies 
Taylor’s formula with remainder to rigorously establish the convergence of 
the Taylor series of the elementary transcendental functions, e.g. 


2 n 
em=ltx+—4 ++ 
2! ! 
and 
3 5 
Xx 
SEE eres rg 


EXERCISE 10. Differentiate Equation (16) with respect to a to verify (17). 


The Cauchy Integral 


During the eighteenth century the integral was generally regarded simply 
as the inverse of the derivative. That is, a function f(x) was integrated by 
finding an antiderivative or primitive function F(x) such that F’(x) = f(x). 
The integral of f(x) over the interval [a, b] was then given, according to 
the still only heuristically understood fundamental theorem of calculus, by 


f "GO de Fb) =F): 


At the same time, the idea of the integral as some sort of limit of a sum, 
or as the area of an ordinate set under a curve, was familiar, but was 
generally relied upon only in approximating integrals when it was incon- 
venient or impossible to find the antiderivative needed in order to apply 
the fundamental theorem of calculus. Neither limits of sums nor areas of 
plane sets were sufficiently well understood to provide a solid basis for a 
logical treatment of the integral. In particular, the notion of area was still 
wholly intuitive—it was regarded as a self-evident concept, and no need 
for a precise definition had yet been perceived. Indeed, the analytical 
integral in the antiderivative sense of Newton was adequate in practice, so 
long as the only functions to be integrated were continuous in the sense of 
Euler, that is, each such function was defined by a single explicit analytical 
expression. 

But in the early nineteenth century, as we have seen, the work of Fourier 
brought to light the need to make integration meaningful for functions that 
are discontinuous (at least in the sense of Euler). Such functions appeared 
naturally in applied problems, and the coefficients of their Fourier series 
were expressed as integrals that did not fit the narrowly analytical pattern 
of eighteenth century integration. 

It was Cauchy who first addressed this necessity “to demonstrate the 
existence of the integrals or primitive functions before making known their 
diverse properties” —that is, to first provide a general definition and proof 
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of the existence of the integral for a broad class of functions that could 
then provide a basis for the discussion of particular integrals and their 
properties. In his Resume des lecons donnees a !’ Ecole Royale Polytechnique 
sur le calcul infinitesimal of 1823 he formulated the definition of the 
integral which (as later completed by Riemann) appears in modern ele- 
mentary treatments of the integral calculus. 

In his twenty-first /egon Cauchy starts with a function f(x) that is 
continuous (in the modern sense) on the interval [x ), X], and subdivides 
this interval into n subintervals by means of the points xo, x,,...,*%, =X. 
With this subdivision or partition P of [x9, X] he associates the approxi- 
mating sum 


ez > fi Msi— 81-0 (18) 


obtained by adding the areas of rectangles based on the subintervals of the 
partition, the rectangle with base [x,_,, x,] having height f(x,_,). He wants 
to define the integral {% f(x) dx as the limit of the sum (18) as the 
maximum of the lengths x, — x,_, of the subintervals approaches zero. Of 


course the existence of this limit must be established. To this end, he says, 


It is important to remark that if the numerical values [lengths] of these 
elements [subintervals] become very small and the number n very large 
the mode of subdivision will have only an imperceptible influence on the 
value of S ({4], p. 122; [12], p. 261). 


To prove this he applies the following elementary arithmetical result 


from his Cours d’analyse ({3], p. 28): If a,,..., @, are positive numbers, 
and a,,...,4, are arbitrary numbers, then 
n 
>, aa, = (a, +--+: +a,) (19) 
i= 
where @ is a “mean” of the numbers aj, ..., a,, that is, @ lies between the 


largest and smallest of them. With a,=x;—x,;_, and a,=f(x,;_,), (19) 
yields 


S = f(xy t+ W(X — xo) (X — Xo) (20) 
for some 0 €(0, 1), because by the intermediate value theorem any mean 
of the numbers f(x,), ...,f(*,—1) is a value of the continuous function f 


at some point of the interval. 

Now Cauchy considers a refinement P’ of the above partition P, that is, 
each subinterval of the partition P’ lies in some subinterval of P. Then the 
sum S’ of the form (18) associated with this new partition can be written as 


S'= Site Sot-o- 48 


where S/ is the sum of those terms of S’ that correspond to subintervals of 
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P’ that lie in the ith subinterval of P. Then (20), applied on this ith 
subinterval, gives 


Ss; = F(%-4 + 6,( x; — Xj d(x; — X;- ) 


for some 8, E(0, 1), i=1,..., 7, so 
So = > F(x + 8(%; — x) )) (6 — 1). (21) 
i=] 
If we write 
6 = f(x 1 + 8(%; — %)~1)) — FO%-1) (22) 
for each i=1,...,, then comparison of (18) and (21) yields 
S’-S= > &(x%,- 4-1) = (X— xp) (23) 
i=] 
for some mean € of €,,...,€,. 


Cauchy concludes from (23) that “one will not alter perceptibly the 
value of S calculated by a mode of division [partition] in which the 
elements [subintervals] of the difference X — x, have very small numerical 
values, if one passes to a second mode in which each of these elements is 
subdivided into many others” ({4], p. 125). This is where he overlooks the 
need to prove that the continuous function f is uniformly continuous on 
[x 9, X], that is, that given « >0 there exists 6 >0 such that | f(x’) —f(x”)| 
<e for any two points x’, x” €[x 9, X] with |x’ — x”| <6. Knowing this, the 
numbers «, defined by (22) could be made as small as desired by choosing 
P with sufficiently short subintervals. 

Now let P, and P, be arbitrary partitions of [x,, X], and let P’ be the 
common refinement obtained by amalgamating the points of subdivision 
of P, and P,. If S,, S,, S’ are the associated approximating sums, then (23) 
gives 

S’-— S, =€(X—x,) and S’— S, = €(X— xp), 
SO 
S, — Sy = (@— & )(X — x9). 


Hence the difference between S, and S, can be made arbitrarily small by 
choosing P, and P, with sufficiently short subintervals. 
Cauchy summarizes this situation as follows ((4], p. 125; [12], p. 265): 


Conceive for the present that one considers at the same time two modes 
of division of the difference X — xo, in each of which the elements of the 
difference have very small numerical values. One will be able to compare 
these two modes with a third in such a way that each element, be it from 
the first, or from the second mode is formed by the union of several 
elements of the third. For this condition to be satisfied, it will suffice that 
each of the values of x interplaced in the first two modes between the 
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limits x» and X be employed in the third, and one will prove that one 
alters the value of S very little in passing from the first or the second 
mode to the third, and consequently, in passing from the first to the 
second. Therefore, when the elements of the difference X — xy become 
infinitely small, the mode of division has no more than an imperceptible 
influence on the value of S; and, if one makes the numerical values of 
these elements decrease indefinitely, by increasing their number, the value 
of S will end by being perceptibly constant or, in other words, it will end 
by attaining a certain limit which will depend solely on the form of the 
function f(x) and on the extreme values x) and X attributed to the 
variable x. This limit is that which one calls a definite integral. 


To clinch his final argument—the actual existence of the limit—Cauchy 
would have needed a completeness property of the real numbers (just as in 
the earlier problem of the existence of the limit of a Cauchy sequence of 
numbers). 


EXERCISE 11. Let f be a continuous function on [a, b], and denote by S,, the Cauchy 
sum (18) associated with the partition of [a, b] into 2” equal subintervals. Then use 
Cauchy’s results to prove that {S,}[° is a Cauchy sequence, that is, given « >0 
there exists an integer N such that |S,,— S,|<e if m,n >N. 


EXERCISE 12. Let {P,,}{° and {P,}/° be two sequences of partitions of the interval 
[a, b} into subintervals whose lengths approach zero as n->oo. Let {S,}° and 
{S;}?P° be the associated Cauchy sums (18) for a continuous function f on [a, 5]. 
Assuming that lim,_,,, S, =J, use Cauchy’s results to show that lim, ,,, Sj;=TJ 
also. 


In the twenty-second /econ ([4], p. 131), Cauchy argues that Equation 
(20) for approximating sums carries over to the integral itself, that 1s, 


f "f(x) dx = f(x%q+ O(X — x9))(X — Xo) 
= f(¥)(X — x9) (24) 


for some @ €[0, 1] or x E[xp, X]. This is the “mean value property of 
integrals”. In the twenty-third Jecon (([4], pp. 134-136) he carefully observes 
that the simple properties 


- [ af (x) + bg(x)] 2 = af. f(x) dx +6 in g(x)dx (25) 


and 


mic dx = ico dx + {°F dx (26) 


follow from the definition of the integral as a limit of a sum. 

In his twenty-sixth /econ ([4], pp. 151-155) Cauchy presents the mgorous 
formulation of the “fundamental theorem of calculus” that is duplicated in 
almost every modern calculus text. Having given an arithmetical definition 
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of the integral, he could now establish the inverse relationship between 
differentiation and integration without relying on intuitive area concepts. 
Given a continuous function f(x) on the interval [x,, X], he wants to prove 
that the new function F(x) defined for x €[x9, X] by 


F(x) = f “f(x) dx (27) 


is a primitive function or antiderivative of f(x), that is, F’(x)= f(x) on 
[a, b]. The only change of detail in his exposition that one might make 
today would be to write f(t) dt instead of f(x) dx in the integrand of (27), 
so as to distinguish the dummy variable of integration from the variable 
upper limit. 

Applying property (26) and then the mean value property (24), Cauchy 
notes that 


F(x +a) — F(x) = [1 dx - f 4) dx 


a [1 ce 
F(x + a) — F(x) = af(x+ 0a) (28) 


for some @ €[0, 1]. Dividing both sides of (28) by a and then taking limits 
as a—0Q, he concludes from the definition of the derivative and the 
continuity of f that 


F’(x) = f(x) (29) 


as desired. Thus 
d x 
a(S £0 at) =F) (30) 


if f is continuous. 

To deduce from this first form the second familiar form of the funda- 
mental theorem, Cauchy considers an arbitrary function F(x) such that 
F(x) =f (x) on [X, X]. If 


w(x) = F(x) — F(x), 
then w’(x)= F (x) — F’(x) = f(x) — f(x) =0, so the mean value theorem 
gives 
w(x) = w(x) + (x — Xq)w’(X) = w( Xp) 
for all x €[xg, X]. Therefore 
F(x) — F(x) = F(x) — F(X9) = — F(x9), 


F(x) = F(x) ~ F(x), 
ff) dx = F(X) — F(x) (31) 


for any antiderivative F of f. 
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Thus we see that Cauchy essentially completed the general theory of 
integration of continuous functions on closed intervals. Although we have 
seen that quite general definitions of functions were formulated in the 
early nineteenth century, it does not appear that anyone then took seri- 
ously the importance for analysis (or perhaps even the existence) of 
functions having more than a finite number of discontinuities in each finite 
interval. It may be observed that Cauchy’s theory of integration for 
continuous functions also suffices for piecewise continuous functions. For 
if the interval [x , X] is partitioned into subintervals [x;_,, x,], i= 
l,...,m, such that f agrees on (x,_,,x,;) with a function f, that is 
continuous on [x;_,, x,], then the integral of f is satisfactorily defined by 


XxX i xj 
fo f@) ax = > ff) ae. 
Xo i=] ° X;-1 
In addition, Cauchy considered integrals of functions having isolated 
infinite discontinuities, that is, improper integrals. For example, if 
lim, ,y f(x) = +00 but f is continuous on [xp, X —e] for each « >0, he 
defined the integral of f on [x,, X] by 


[I dx = lim fo 409) dx 


provided that this limit exists. 


The Riemann Integral and Its Reformulations 


Genuinely discontinuous functions entered the mainstream of mathematics 
through the work of G. B. F. Riemann (1826-1866) on the convergence of 
Fourier series. In the course of extending the applicability of Dirichlet’s 
convergence proof to a wider class of functions, Riemann formulated the 
generalization of Cauchy’s integral that to this day remains the most 
convenient and useful one for elementary applications of the calculus. 

In his 1854 “Habilitationschrift” Riemann, who in 1859 would succeed 
Dirichlet in the Gottingen chair that Gauss had occupied, took under 
Dirichlet’s influence a fresh look at the representability of functions by 
means of trigonometric series. Although willing to concede that the more 
general functions he proposed to consider probably do not occur in nature, 
he felt an investigation of their Fourier series would be worthwhile because 
“this subject is closely related to the principles of the infinitesimal calculus 
and can serve to bring greater clarity and precision to these principles” 
({16], p. 238). This 1854 investigation, which did not appear in print until 
1867, is reprinted in Riemann’s collected works ({16] pp. 227-265), and 
selected portions are translated in Birkhoff’s source book ({1], pp. 16-23). 

The first three sections of the paper are devoted to a summary of the 
history of Fourier series. In his 1829 paper Dirichlet had shown that, if the 
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period 27 function f(x) 
(a) is piecewise Continuous and 
(b) has only finitely many local maxima and minima in the interval 
[— 9, v7], 

then its Fourier series converges pointwise to 3[f(x+0)+ f(x —0)], the 
average of its righthand and lefthand limits (assuming that these exist at 
every point). However, it appeared that the only need for the piecewise 
continuity assumption (a) was to ensure the integrability of the function 
f(x) and the meaningfulness of the integrals appearing as its Fourier 
coefficients. 

Riemann therefore poses at the beginning of Section 4 the question 
“What is one to understand by {2f(x) dx?’, to which he immediately 
supplies the following answer ([16], p. 239; [1], p. 22): 


In order to establish this, we take a sequence of values x,, X.,...,X,—1 
lying between a and 5 and ordered by size, and for brevity, denote x, — a 
by 6), x2— x, by 6,...,b—x,_, by 4,, and proper positive fractions by 
e,, Then the value of the sum 

S = 6, f(at+e,6,) + 6,f (x, + €62) + 43 f (x2 + €383) 
ai + bf (%n—1 + &n 5) 


will depend on the choice of the intervals 6, and the quantities ¢,. If it has 
the property that, however the 6; and the e,; may be chosen, it tends to a 
fixed limit A as soon as all the 6; become infinitely small, then this value is 
called f°f(x) dx. If it does not have this property, then f°f(x) dx is 
meaningless. 


Thus Riemann chooses an arbitrary point x;=x,;_,+ 66, in the ith 
subinterval [x;_,, x;] of his partition, i=1,...,, and defines the integral 
by 


[7 dx = lim > SEN =X 4); (32) 


where 6 denotes the maximum of the lengths 6, of the subintervals of the 
partition of [a, 5]. This is a direct generalization of Cauchy’s definition, 


JF de = tim YF (% 1-1-0). 
a WN i=l 


Riemann has simply replaced Cauchy’s initial point x,_, with an arbitrary 
point x, of [x,;_,, x,;], and insists (if the integral is to exist) that the 
approximating sums thereby associated with a partition approach a fixed 
value (the integral) as the “norm” 6 approaches zero, independently of the 
choice of the points <x ,. 

Now he says, “Let us determine the extent of the validity of this concept, 
and ask: in what cases is a function integrable and in what cases is it not?’ 
({16], p. 240; [1], p. 22). Starting with a bounded function f and a partition 
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P of [a,b], he considers the “total oscillation” D(P)=D,6,+ D,6, 
+--:-+D,6, of f(x) with respect to P, where 6,=x,;—x;_, and D, 
denotes the difference between the largest and smallest values of f(x) on 
the ith subinterval [x,_,, x,]. Sidestepping (or overlooking) the question of 
the completeness of the real numbers, he takes it for granted that the 
integral (32) exists if and only if D(P)->0 as the norm 6-0, 


lim (D,6, + Dd, + aio + D6.) = 0. (33) 


EXERCISE 13. Let {S,}?° and {S,}?° be two different sequences of Riemann sums 
(corresponding to different choices of the points x;) associated with a sequence of 
partitions whose norms approach zero as n->oo. Assuming that (33) holds, show 
that lim, _,,.S,, =lim,_,..S,, provided that one of these limits exist. 


EXERCISE 14. If f(x) is a monotone non-decreasing function on [a, b], show that 
D(P) < D6 where D is the oscillation of f(x) on [a, b]. Hence conclude that f 
satisfies (33) and is therefore integrable. 


EXERCISE 15. Let ¢ be Dirichlet’s discontinuous function such that $(x) =0 if x is 
rational, but ¢(x)=1 if x is irrational, x €[0, 1]. Show that D(P)=1 for any 
partition P of [0, 1], and therefore conclude that ¢ is not Riemann integrable. 


Next Riemann defines A=A(d) as the maximum value of the total 
oscillation D(P) for all partitions P with norm é<d. Then A(d@) is 
obviously a decreasing function of d, and f is integrable on [a, 5] if and 
only if lim,_,,A(d@)=0. Given o>0 and a partition P he denotes by 
s = S(o, P) the sum of the lengths 6; of those subintervals of P for which 
the oscillation D,>o. He now establishes the following necessary and 
sufficient condition for the existence of the integral of a bounded function. 


If f(x) is bounded for x €[a, b], then f2f(x) dx exists if and only if, 
given o >0, it follows that s(o, P) approaches zero as the norm of the 
partition P approaches zero. 


That is, given o > 0 and « >0, there exists d >0 such that for any partition 
P with norm 6 <d, the sum s of the lengths of those subintervals of P, on 
which the oscillation of f(x) is greater than o, is less than e. 

To see that this condition is necessary for the integrability of f, note that 


os < D,6,+ D,6,+ --- +D,6, < A(d) 


if 5<d, because D, >o on subintervals having a total length of s. Therefore 
s(o, P)< A(d)/o, which approaches zero as d-—>0 with o fixed, assuming 
that [2f (x) dx exists. 

To see that the above condition is sufficient for integrability, let o >0 
and e >0 be given, and choose d > 0 as above. If the norm of the partition 
P is less than d, then those subintervals on which the oscillation of f(x) 1s 
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greater than o contribute to D(P) an amount less than De (because their 
lengths add up to s<e), where D is the oscillation of f(x) on [a, b]. The 
remaining subintervals contribute to D(P) an amount at most o(b— a). 
Hence 


D(P) < De + o(b— a), 


so D(P) can be made as small as desired by taking e and o sufficiently 
small. Thus condition (33) is satisfied, so (°f(x) dx exists. 

Riemann’s theorem immediately implies that (°f(x) dx exists if f is 
uniformly continuous on [a, 5b]. For in this case, given o >0, there exists 
d >0 such that the oscillation of f(x) is less than o on any subinterval of 
length less than d. Therefore s(o, P)=0O if the norm of the partition P is 
less than d. 


EXERCISE 16. Apply condition (33) to show that any uniformly continuous function 
is integrable. 


However, Riemann refrained from concluding that every continuous 
function is integrable. The uniform continuity of a continuous function on 
a closed interval was not rigorously established until the early 1870’s when 
the Bolzano-Weierstrass theorem (stated by Bolzano but proved by 
Weierstrass in his lectures) was available. According to this theorem, every 
infinite sequence of points in an interval has a subsequence that converges 
to some point of the interval. If the function fis not uniformly continuous 
on [a, 5], then for some 6 >0 and each positive integer 7 there exist points 
a,, b,E[a, b] such that ja,—b|<1/n but |f(a,)—f(b,)| 26. By the 
Bolzano-Weierstrass theorem it may be assumed without loss of generality 
that the sequences {a,};° and {b,}{° both converge to some point c€ 
[a, 5]. It follows that f 1s not continuous at c (why?). 


In the opposite direction, Riemann pointed out that a function can be 
discontinuous at a dense set of points but nevertheless be integrable. 
“Since these functions are as yet nowhere considered, it will be good to 
start with a specific example” ({16], p. 242). His example is described as 
follows. For each real number x, let (x) = x — i(x) where i(x) is the integer 
nearest to x, unless x is an odd multiple of 7 in which case (x) =0. Then 
a $< (x) <F for all x, and (x) is continuous at x unless x is an odd 
multiple of 3, in which case (x) has a “jump”, or difference of lefthand and 
righthand limits, of (x — 0) — (x +0) =1. Riemann’s exotic function is then 
defined by 


f(x) = D4 E99, - s . (34) 


ae 3; k=1 


If x is not a rational number of the form m/2n, where m and n are 
relatively prime integers with m odd, then kx is not an odd multiple of } 
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for any k. Consequently each term of (34) is continuous at x and it can be 
proved, as expected, that f is continuous at x. 

But if x is of the indicated form m/2n, then kx is an odd multiple of Mu 
when k is an odd multiple of n, k = n(2p + 1). In this case the term (kx)/ Ke 
of (34) has a (negative) jump of 1/k?=1/n?Q2Qp +1) at x=m/2n. This 
indicates (and it can be verified) that fis discontinuous at each such point 
x = m/2n, having there a jump 


or 


p=1 = +1) ~ Br? 


1Ms 


f(x—0) — f(x +0) =< 


using one of Euler’s summations. Of course these points of discontinuity 
are dense in every interval. 

On the other hand, if an interval [a, b] and a number o >0 are given, 
there are only a finite number of these points x = m/2n in [a, b] such that 
a’ /8n* >o. Consequently the sum s(o, P) of the lengths of the subintervals 
containing these latter points can be made arbitrarily small by choosing 
the norm of the partition P small enough. Thus Riemann’s function (34) 
satisfies his sufficient condition for integrability, so (°f(x) dx exists despite 
the denseness of the set of discontinuities of f. 


Riemann’s definition (32) of the integral was the most general one that 
could be based directly on Cauchy’s original device of approximating sums 
associated with partitions of the interval of integration into subintervals. 
Nevertheless, during the last three decades of the nineteenth century, this 
definition was reformulated in several ways that further illuminated the 
concept of the integral and paved the way for important additional 
generalizations in the early twentieth century. 

In the middle 1870’s several authors independently introduced the 
so-called upper and lower Riemann sums for the bounded function f on 
the interval [a, 5], 


n 


U(P) = > M(x,;—x;-,) and L(P) = > m(x,—x,-,), (35) 

i=l i=] 
where P is a partition of [a, 5] into n subintervals, and M, and m, are the 
maximum and minimum values (actually, the least upper and greatest 
lower bounds) of f(x) on the ith subinterval [x,_,, x,]. Today these are 
often called “Darboux sums” after Gaston Darboux (1842—1917)—see [6]. 


EXERCISE 17. If P’ is a refinement of the partition P, show that 
L(P) < L(P’) < U(P’) < U(P). 


Using this observation, it 1s easily verfied that the upper and lower sums 
U(P) and L(P) approach limits U and L, respectively, as the norm 6 of the 
partition P approaches zero, whether or not the bounded function f is 
integrable. In the 1880’s Vito Volterra (1860-1940) introduced the terms 
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“upper integral” and “lower integral” for U and L, together with the 
descriptive notation 


U =f "f(x ax and L =f f(x) dx, 


and Giuseppe Peano (1858-1932) noted that these upper and lower in- 
tegrals could be defined conveniently as the greatest lower and least upper 
bounds of the upper and lower Riemann sums, respectively, for all parti- 
tions P of the interval [a, 5], | 


fre dx = glb{ U(P)} and (ie. dx = lub{ L(P)}. (36) 


The function f is the integrable if and only if its upper and lower integrals 
are equal, fof (x) dx = {ef (x) dx. 


EXERCISE 18. Use (36) and Exercise 17 to show that f°f(x) dx < f%f(x) dx. 


EXERCISE 19. If ¢ is Dirichlet’s function of Exercise 15, show that f o0(X) dx =0 
while f 9¢(x) dx = 1. 


Since the seventeenth century the idea of the integral had always been 
motivated by the concept of area. In particular, if O, denotes the ordinate 
set of the non-negative function f on the interval—the set of all points. 
(x, y) with a<x <b and 0< y < f(x)—the idea was that the value of the 
integral (2f(x) dx should be the area a(O,). Yet, prior to the late nine- 
teenth century, the concept of area itself had been wholly intuitive and not 
based on any precise definition. 

The first formal mathematical definition of area apparently was given by 
Peano in a book published in 1887 [15]. Beginning with Eudoxus and his 
method of exhaustion, it had always been taken as obvious that the area of 
a plane set S is the upper bound of the areas of all polygons that are 
contained in S, and the lower bound of the areas of all polygons that 
contain S (of course the area of a polygon is obtained by dissecting it into 
triangles). 

Peano took this ancient idea as the starting point for an actual definition 
of area. He defined the inner area a(S) of S as the least upper bound of 
the areas of all polygons that are contained in S, and the outer area a,(S) 
as the greatest lower bound of the areas of all polygons that contain S. It is 
Clear that a,(S) <a,(S), but the two may not be equal. For example, if S is 
the set of all points (x, y) in the unit square 0<x, y <1 such that the 
numbers x and y are both irrational, then a,(S) = 1, the area of the square, 
but a(S) =0 because only degenerate polygons are contained in S. 

With Peano’s definition of inner and outer area, it was easy to establish 
that 


b ab 
LF) dx = a(O,) and J 4x) dx = a,(O,) (37) 
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for any non-negative function f on [a, b]. In case a(S)=a,(S), the 
common value is the area a(S) of S. If f is integrable, then (37) reduces to 


[5@) dx = a(O,). (38) 


At this point the concept of the integral had come full circle, back to its 
original motivation. 


Peano’s area is now generally referred to as “Jordan content” because of 
its definitive treatment in the second edition (1893) of Camille Jordan’s 
influential Cours d’ analyse [13]. With the minor difference that he uses 
only polygons that are made up of small squares with horizontal and 
vertical sides, Jordan defines what he called the inner content cS) and 
outer content c)(S) of a plane set S, equivalent to Peano’s inner and outer 
areas. However, his approach works equally well in all dimensions, so his 
concept of content (“etendue’’) simultaneously generalizes the concepts of 
length, area, and volume ({13], pp. 28-31). He calls the set S measurable 
with content c(S) if c(S)= q@(S). 

Jordan proceeds to define the Riemann integral of bounded function f 
of n real variables defined on a measurable set E in Euclidean n-space 
({13], pp. 32-37). Let P be a partition of E into measurable sets 
E,,..., £,, with non-overlapping interiors, and let p; be an arbitrary point 
of E,,i=1,...,m. Then 


s(P) = > f(A) 


is a Riemann sum for f on E. 
The function f is integrable on the set E provided that the limit 


[f= tim S face) (39) 
E 5-0 ;=1 


exists, 6 being the maximum of the diameters of the sets E,,..., E,,. In 
the one-dimensional case with E being an interval [a, 5], this 1s 


[°F ax = tim D SRE) (40) 


where, in comparison with Riemann’s definition (32), the partition of [a, 5] 
into subintervals has been generalized to a partition of the interval into 
measurable sets. 

Both (38) and (40) were reformulations of the definition of the Riemann 
integral that, in contrast to (32), are directly susceptible of significant 
generalization. A detailed discussion of the role of these reformulations as 
forerunners to the Lebesgue integral (see Chapter 12) is given by Hawkins 
in Chapter 2-4 of his book on the origins and development of Lebesgue’s 
theory of integration [11]. 
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The Arithmetization of Analysis 


The calculus of Newton and Leibniz was a calculus of geometric variables, 
of quantities explicitly associated with geometric curves, and much of their 
analysis depended upon intuitive geometric concepts. Euler, Lagrange, and 
Cauchy attempted to substitute the principles of arithmetic for geometric 
intuition in the foundations of analysis; not a single geometrical diagram 
appears in their books on the infinitesimal calculus. However, these first 
attempts at an arithmetization of the subject were only partially successful 
because, prior to the late nineteenth century, the real numbers themselves 
were understood only in an intuitive fashion. 

Since the seventeenth century mathematicians had pragmatically used 
irrational numbers (such as V2 ) in an uncritical way without seriously 
questioning their precise meaning or nature, relying for computational 
purposes upon the assumption that any irrational number can be arbi- 
trarily closely approximated by rational numbers (e.g., V2 = 1.41421... ). 
In particular, it was assumed not only that irrational numbers exist as 
needed for the ordinary purposes of analysis, but that they obey the same 
laws of algebraic operation as the familiar rational numbers. Nevertheless, 
as Richard Dedekind (1831-1916) remarked in an 1872 essay ({7], p. 22), 
such a simple fact as V2 - V3 = V6 had never been rigorously estab- 
lished. 

In the absence of a full understanding of the real number system it was 
impossible to provide firm foundations for the calculus. For example, the 
Bolzano-Cauchy proof of the intermediate value theorem for continuous 
functions required the “bounded monotone sequence property” of the real 
numbers—to the effect that every bounded sequence {a,} of numbers, 
that is either increasing (a, <a,,, for all m) or decreasing (a, >a,, , for all 
n), is convergent. This same property of the real numbers is needed to 
establish the sufficiency of the Cauchy convergence criterion, and it was 
implicitly assumed by both Cauchy and Riemann in their proofs of the 
existence of the integral under appropriate hypotheses. However, the 
validity of this basic property of the real numbers was not verified, but 
merely assumed to be evident on geometrical grounds. 

This vagueness in the foundations of analysis resulted not only in logical 
gaps but also in actual errors on occasion. For example, it was generally 
thought during the early nineteenth century that every continuous function 
is differentiable except perhaps at isolated singular points (such as x =0 
for the function f(x) =|x|); indeed, several calculus texts of this period 
purported to prove this false proposition. It therefore came as a healthy 
shock when Karl Weierstrass (1815-1897) exhibited, in his Berlin lectures 
as early as 1861, a function that is continuous everywhere but differentia- 
ble nowhere. His example was the function 

ie. @) 
f(x) = b"cos(a"rx) 
=0 


n 
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where a is an odd integer and b €(0, 1) a constant such that ab > 1+32/2. 
This infinite series converges uniformly on the real line, so f is continuous 
everywhere. However, it turns out that given any point x, and any positive 
number M, there exist points x, and x, arbitrarily close to x, such that 


f(x) — f (Xo) Ae -auhile f (x2) — f (Xo) 


< —M. 


It obviously follows that f is not differentiable at xp. 

Actually, Bolzano had described an example of a non-differentiable 
continuous function in 1834, but it had gone unnoticed. It was Weierstrass’ 
example whose impact made clear the necessity of a re-examination of the 
foundations of analysis. In particular, instead of taking the real number 
system for granted, as “given”, it was necessary that the real numbers be 
constructed or defined in such a way that the existence and properties of 
irrational numbers could be rigorously proved. Dedekind later wrote that, 
when he first lectured on the calculus in 1858, he 


felt more keenly than ever before the lack of a really scientific foundation 
for arithmetic. In discussing the notion of the approach of a variable 
magnitude to a fixed limiting value, and especially in proving the theorem 
that every magnitude which grows continually, but not beyond all limits, 
must certainly approach a limiting value, I had recourse to geometric 
evidences. Even now such resort to geometric intuition in a first presenta- 
tion of the differential calculus, I regard as exceedingly useful, from the 
didactic standpoint, and indeed indispensable, if one does not wish to lose 
too much time. But that this form of introduction into the differential 
calculus can make no claim to being scientific, no one will deny. For 
myself this feeling of dissatisfaction was so overpowering that I made the 
fixed resolve to keep meditating on the question till I should find a purely 
arithmetic and perfectly rigorous foundation for the principles of infinites- 
imal analysis. The statement is so frequently made that the differential 
calculus deals with continuous magnitude, and yet an explanation of this 
continuity is nowhere given; even the most rigorous expositions of the 
differential calculus do not base their proofs upon continuity but, with 
more or less consciousness of the fact, they either appeal to geometric 
notions or those suggested by geometry, or depend upon theorems which 
are never established in a purely arithmetic manner. Among these, for 
example, belongs the above-mentioned theorem, and a more careful 
investigation convinced me that this theorem, or any one equivalent to it, 
can be regarded in some way as a sufficient basis for infinitesimal 
analysis. It then only remained to discover its true origin in the elements 
of arithmetic and thus at the same time to secure a real definition of the 
essence of continuity ((7], pp. 1-2). 


The year 1872 was marked by the almost simultaneous publication of 
constructions of the real numbers by Dedekind, Georg Cantor (1845- 
1918), Charles Meray (1835-1911), and Edward Heine (1821-1881); 
Weierstrass had given an earlier construction in his Berlin University 
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lectures. The constructions of Dedekind and Cantor are those that are 
generally employed today. 

Dedekind’s approach was closely related to Eudoxus’ definition of 
proportionality for ratios of geometric magnitudes. We saw in Chapter 1 
that, given incommensurable magnitudes a and JD, this definition of pro- 
portionality serves to separate the set Q of all rational numbers into two 
disjoint subsets L and U such that every element of L is less than every 
element of U—the rational number m/n is in L if m:n<a: b, and 
otherwise is in U. Dedekind noted that, similarly, every rational number r 
partitions Q into two sets A, and A, such that every element of A, is less 
than every element of A,. There are actually two possibilities, according to 
whether r is the largest element of A, or the smallest element of A., but 
these two corresponding partitions of Q may be regarded as essentially 
equivalent. 

Dedekind defined a cut of the rational numbers as a partition (A,, A.) of 
Q into two non-empty disjoint subsets such that every element of A, is less 
than every element of A,. Whereas some cuts are generated by rational 
numbers, others are not. For example, if A, consists of all positive rational 
numbers x such that x” >2, while A, consists of all other rational numbers, 
then there is neither a largest element of A, nor a smallest element of A,, 
because there is no rational number x such that x? =2. Intuitively, this cut 
(A,, A,) may be regarded as generated by the irrational number V2. 

“Whenever”, Dedekind says, “we have to do with a cut (A,, A,) pro- 
duced by no rational number, we create a new, an irrational number a, 
which we regard as completely defined by this cut (A,, A); we shall say 
that the number a corresponds to this cut, or that it produces this cut” ({7], 
p. 15). In more modern language, the set R of all real numbers is defined to 
be the set of all cuts of the rational numbers, except that the two 
essentially equivalent cuts produced by a given rational number are 
identified with each other. Thus a real number is a cut a=(A,, A,); a isa 
rational real number if this cut is generated by a rational number, other- 
wise a iS an irrational real number. 

Order and algebraic operations are easily defined for real numbers 
regarded as cuts. If a=(A,, A,) and B=(B,, B,) are different real num- 
bers, we say that a <B provided that A, is a proper subset of B,. Dedekind 
proves that, for any two real numbers a and £8, either a< 8 or a=8 or 
a>. He defines the sum y=(C,, C,)=a+8 as follows: the rational 
number c is in C, if there exist a, « A, and b, € B, such that a, +b, >c; 
otherwise c € C, ([7], p. 21). 


EXERCISE 20. Define the product of two real numbers a and £. It suffices to 
consider the case in which a and £ are both positive (that is, are greater than the 
real number generated by the rational number 0). 


Dedekind proves what he calls the continuity property of the real 
number system—every cut of the set of real numbers is generated by some 
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reat number. Thus the real numbers are complete in that, whereas the real 
numbers are generated by taking cuts of the rational numbers, no new 
numbers are generated by taking cuts of the real numbers. 

The crucial bounded monotone sequence property is equivalent to this 
completeness property, but is easily established directly. Let {a,}?= 
{(A,, B,)}° be a sequence of real numbers such that a, <a,,, anda, <p 
for every n, where p =(M, N) is a fixed real number. If Aj, is the union of 
the increasing sequence of sets {A,},° and B, is the intersection of the 
decreasing sequence {B,};°, then it turns out that ayg=(Ap, Bo) is a real 
number such that lim, _, a, = Q. 

EXERCISE 21. Verify that ag =(Ao, Bo) 1s a cut of the rational numbers. Why is the 
set By non-empty? 


Cantor’s approach was based on the idea of a real number a as the limit 
of a sequence {a,}/° of rational numbers. In this case {a,}/° will be a 
fundamental sequence satisfying Cauchy’s convergence criterion that 
a,—4, approaches 0 as m,n-—>oo. Cantor wants to identify the real 
number a with this fundamental sequence of rational numbers. However, 
two fundamental sequences {a,}° and {b,}7° will have the same limit if 
lim,,_,0(4, — 5,)=90, in which case these two sequences are called equiv- 
alent. 

In modern language, Cantor’s construction amounts to defining the set 
of real numbers to be the set of all equivalence classes of fundamental 
sequences of rational numbers. If 7 is a rational number, then the sequence 
{r,r,r,..., } represents the real number that corresponds to r. However, 
the most obvious representative of the real number V2 is the sequence 
{1, 1.4, 1.41, 1.414, ..., } consisting of its finite decimal approximations. 

Cantor’s real number is perhaps a more complicated object than Dede- 
kind’s real number, but the sequential definitions of the algebraic opera- 
tions are simpler. Let the real numbers a and £8 be represented by the 
fundamental sequences {a,};° and {5,})°. Then the sum a + B and product 
af are represented by the fundamental sequences {a, + 5,})° and {a,5,}7°, 
respectively. We say that a >£ provided there is a positive d such that 
a, 2b, +d for n sufficiently large. Cantor proves that his real numbers are 
complete in the sense that every fundamental sequence of real numbers 
converges to a real number. In particular, every bounded monotone 
sequence of real numbers, being a fundamental sequence, converges. 


EXERCISE 22. Show that {a,5,};° is a fundamental sequence if {a,}° and {5,}7° 
are. Hint: a,,b,, — GD, = Gn(5,, — 5,) + 5,( an, — a,). Use the fact that every funda- 
mental sequence is bounded. 


In either approach, Dedekind’s or Cantor’s, the set Q of rational 
numbers is taken as the starting point for the construction of the set R of 
all real numbers. Then Q@ is enlarged by the addition of irrational numbers 
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which continue to obey the familiar algebraic laws. Most important, the 
construction of R permits the rigorous verification of the completeness 
property that, in its various forms, plays a crucial role in infinitesimal 
analysis. 

The construction of the real number system was the principal step in the 
arithmetization of analysis during the closing third of the nineteenth 
century. The final loose end was tied by Weierstrass in his purely 
arithmetical formulation of the limit concept, which previously had been 
tinged with connotations of continuous motion—it was said that lim, _,, 
f(x)=L provided that f(x) approaches L as x approaches a. Weierstrass 
objected to this “dynamic” description of limits, and replaced it with a 
“static” formulation involving only real numbers, with no appeal to motion 
or geometry: lim, _,f(x)=L provided that, given «€>0, there exists a 
number 6 >0 such that | f(x)— L| <e if 0<|x—a|<6. With the various 
types of limits appearing in the calculus reformulated in this way, the 
arithmetization of analysis was complete, and the calculus had assumed 
precisely the form in which it appears in twentieth century expositions. 
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Postscript: The Twentieth Century 


This brief closing chapter is devoted to two twentieth century develop- 
ments that have in very different ways served to complete the historical 
development of the calculus. The comprehensive theory of integration that 
stems from the work of Henri Lebesgue (1875-1941) is {in a certain 
technical sense) the ultimate generalization of the concept of the integral 
for real-valued functions of a real variable. The non-standard analysis of 
Abraham Robinson (1918-1974) provides at long last a logical foundation 
for infinitesimals as they were frequently used in the seventeenth and 
eighteenth centuries. 


The Lebesgue Integral and 
the Fundamental Theorem of Calculus 


The “fundamental theorem of calculus” provides a generic formulation of 
the inverse relationship between differentiation and integration. The de- 
rivative of the (indefinite) integral of a function is that function, 


S50 dt = fe), (1) 


and the integral of the derivative of a function is (to within a constant) that 
function, 


f “f'(t) dt = f(x) — f(a). (2) 


The status of the fundamental theorem in Cauchy’s theory of integration 
was quite satisfying—Formulas (1) and (2) held so long as the functions 
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being integrated were continuous, and Cauchy only defined the integral for 
continuous functions. 

With Riemann’s more general definition of the integral for discontinu- 
ous functions, however, the fundamental theorem lost its completeness of 
scope. Formula (2) is not meaningful for a differentiable function whose 
derivative is not Riemann integrable, and the classical proof establishes 
Formula (1) only at points of continuity of the function f. The restoration 
of the fundamental theorem to a satisfactory status was ohe of the first 
fruits of the new theory of integration that Lebesgue introduced in his 1902 
doctoral thesis [4] and expanded in his 1904 book [5] and subsequent 
papers. An excellent outline of Lebesgue’s work is given by Hawkins ({1], 
Chapter 5). 

In Jordan’s approach to Riemann integration, the lower and upper 
Riemann integrals were defined by 


b n 
Rf f(x) dx = lub > mec(E) (3) 
is i=1 
and 
ra n 
R{ f(x) dx = glb >, M,c(E,), (4) 
a i=l 
where {E,,..., £,,} denotes a partition of [a, b] into Jordan measurable 


sets with Jordan contents c(E£;), and m, and M, denote the greatest lower 
bound and least upper bound, respectively, of f(x) for x € E;. Lebesgue’s 
basic idea was to enlarge the class of functions for which the integral is 
defined by enlarging the class of sets that are measurable. If the measure 
m(E) is defined for a class of sets that properly includes the Jordan 
measurable sets, and m(E)=c(£) if E is Jordan measurable, then the 
integral can be generalized by replacing the sets E, in (3) and (4) by these 
new measurable sets. 

Lebesgue generalized the concept of measure by basing it on countably 
infinite rather than finite coverings. Given a subset E of the real line, he 
defined its outer measure m,(E) as the greatest lower bound of the sums 


= MU) 


where {J,,};° is a sequence of intervals whose union contains E. If Ec 
[a, b], then the inner measure of E is defined by 


m(E) = (b—a) — m([a, 6] — E). 


The bounded set £ is called (Lebesgue) measurable with measure m(E) 
provided that m,(E) = m,(E)= m(E£). Since it is clear that 


c(E) <m(E) <m,(E) < cE), 
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it follows that the set E is Lebesgue measurable if it is Jordan measurable, 
in which case m(E)=c(£). Although for convenience we have defined 
Lebesgue measure only for bounded sets, this restriction is actually unnec- 
essary. 

The principal advantage of Lebesgue measure over previous measure 
concepts is that it is countably additive: If {£,})° is a sequence of 
mutually disjoint measurable sets, then their union is measurable with 


oo ore) 

m U z,} = 2 m(E,). 
n=1\ n=1 

For example, if Q is the (countable) set of rational numbers in the unit 

interval, it follows immediately that QO is measurable with m(Q)=0. It is a 

simple consequence of countable additivity that 


m U z,) = lim m(E,) if £,, Cc E+ (5) 
ee n—>0o 


for each n, while 


io @] 

m (7) z,| = lim. mE) if £4, Cc E, (6) 
n=] ae 

for each n. Also, if E and F are measurable sets with F CE, then the 

difference E — F is measurable with 


mE — F) = m(E) — m(F). 


The lower and upper Lebesgue integrals L f°, f(x) dx and 
L f°? f(x) dx of a bounded function f are defined just as the lower and 
upper Riemann integrals (3) and (4), except that now {E£,,..., £,} isa 
partition of [a, b] into Lebesgue measurable sets and c(£,) is replaced by 
m(E,). Then f is Lebesgue integrable on [a, b] provided that these lower 
and upper integrals are equal. Since it is clear that 


Rf Ff) dx < Lf” (x) dx < Lf’ f(x) dx < Rf’ f(x) dx, 


it follows that the function f is Lebesgue integrable on [a, b] if it is 
Riemann integrable on [a, b], in which case its Lebesgue and Riemann 
integrals are equal. 

Lebesgue proved that a bounded function on a closed interval is Rie- 
mann integrable if and only if its set of discontinuities has measure zero, 
whereas every bounded measurable function on a closed interval is Le- 
besgue integrable. A function is called measurable if the inverse image of 
each open interval is a measurable set. For example the Dirichlet function 
on [0, 1] is not Riemann integrable because its set of discontinuities is the 
whole interval (of measure 1 40), but it is Lebesgue integrable because it is 
obviously measurable (why?). 
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Lebesgue’s proof that every bounded measurable function is (Lebesgue) 
integrable was based on the new idea of partitioning the range of a 
function (rather than its domain, a/q Cauchy and Riemann) into subinter- 
vals. If m < f(x) <M for x €[a, b], let the points 


M=Yo<y¥,<'°'+ <y, = M 


partition the interval [m, M] into n equal subintervals each having length 
(M—m)/n. If E, denotes the set of points x €[a, b] such that y,_, < f(x) 
<y;, then the sets {£;} are measurable if f is measurable. The lower and 
upper Lebesgue sums corresponding to the partition {F,,..., E,} of [a, 5] 
are 


> y,_-:m(E) and SD ym(E). 


i=] i=] 


Because the difference 


> (y;-Y;-)m(Z;) = i AL > m(E,) = —(M—m)(b— a) 


i=] i=] 


of these sums can be made arbitrarily small by taking n sufficiently large, 
it follows that the lower and upper Lebesgue integrals of f are equal, so the 
bounded measurable function f is Lebesgue integrable on [a, 5]. 

In addition to the fact that the class of integrable functions has been 
enlarged considerably, the power of the Lebesgue integral results from the 
ease with which it handles limits of functions. Suppose that lim, _,., f,(x) = 
f(x) for each x €[a, b], and we ask whether 


lim f °£ (x) dx = f ”£(x) dx. (7) 


The only easy result for the Riemann integral is that (7) holds if the 
functions { f,};° are continuous and the convergence is uniform (otherwise 
f may not even be Riemann integrable). But these conditions are too 
strong for many applications; indeed, we have seen in previous chapters 
numerous examples of “termwise integration” under weaker conditions. 
For the Lebesgue integral it turns out that (7) holds under very weak 
conditions; this is the main reason for the modern prominence of Le- 
besgue’s theory of integration. 

It is a consequence of elementary properties of measurable sets that the 
pointwise limit f of a convergent sequence { f,};° of measurable functions 
is itself a measurable function (see Royden [7], p. 56). If in addition f is 
bounded on [a, 5], then it follows that f is Lebesgue integrable on [a, 5]. 
Lebesgue’s bounded convergence theorem asserts that if { f,}7° is a conver- 
gent sequence of measurable functions such that | f,(x)| < K for some fixed 
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K and each x €[a, b], and f(x)=lim, ,., f(x), then 
Jim, [°f,() ax = f"F(x) ae (7) 


(the integrals here being Lebesgue integrals). Thus the limit of the 
sequence of integrals is “what it ought to be” if, in addition to being 
measurable, the functions { f,}°, are “uniformly bounded” on [a, b]. This 
illustrates the nice convergence properties (some of them even stronger) 
that the Lebesgue integral enjoys. 

For example, let {7,};° be the set of rational numbers in the interval 
(0, 1] and define 


coe . ee hy Poteet 
1 otherwise. 


Then it is clear that each , is measurable with {5 ¢,(x)dx=1. But 
o(x)=lim, _,..¢,(x) 1s the Dirichlet function, and Lebesgue’s bounded 
convergence theorem now gives 


fo) dx = lim [$0 dx = 1, 
re) n—co 0 


The proof of the bounded convergence theorem is notable for its 
simplicity. Let {f,};° be a convergent sequence of measurable functions 
such that | f,(x)| <X for all nm and x €[a, b]. Given e >0, denote by E, the 
set of all points x €[a, b] such that | f,,(x) —f(x)| > for some m >n. Then 
{E,,}1° is a decreasing sequence of measurable sets, and  >?~_, EZ, is empty 
because f(x) converges to f(x) for all x €[a, b]. Therefore (6) implies that 
lim,,_,.. m(E,,) = 0. Since | f,(x) — f(x)| <e unless x € E,, and | f(x) —f(x)| 
< 2K everywhere, it follows that 


Lf fal) dx— f f(x) dx| < [lie — f(x)| dx 
< 2Km(E,) + e(b- a). 


Since lim,,_,,, m(E,,) =0 and e >0 can be taken arbitrarily small, it follows 
that lim, .,, [2 f,(x) dx =? f(x) dx as desired. 

The bounded convergence theorem is the tool that is needed to establish 
the fundamental theorem of calculus for the Lebesgue integral. 


Theorem A. If f is differentiable and f’ is bounded on [a, b], then f’ is 
Lebesgue integrable, and 


f ° f(x) dx = f(b) — f(a). (2) 


ProoF. Let g,(x)=[f(x+h,)—f(x)]/h, with h,=1/n. Then {2,}% is a 
uniformly bounded sequence of measurable functions converging to f’ on 
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[@a, b], so the bounded convergence theorem gives 


[7 dx = im [ex dx 


lim — (°f f(x+h,)—f(x)] ax 


n—>0o h,, 
. l b+h, 1 sath, 
= lim hs f(x) dx -— jim: me. f(x) dx 
b 
[ fC) a = f(b) - f(@) 
because the function f, being differentiable, is continuous. O 


Theorem B. Let f be bounded and measurable on [a, b], and define 
F(x) = f f(t) dt. 
Then there exists a set E C{a, b] having measure zero such that | 
F'(x) = f(x) (1) 
for all x not in E. That is, (1) holds “almost everywhere.” 


Proor. Let f,(x)=max{ f(x), 0} and f_(x)=max{—/(x), 0}. Then f, 
and f_ are non-negative functions such that f=/, —f_. Then 


F(x) = f “f,(t) dt - i “f_(t) dt = F(x) — F,(x), 


where F, and F, are monotone non-decreasing functions. But it is a fact 
independent of integration that every monotone function is differentiable 
almost everywhere (see Royden [7], p. 82). Hence F’(x) exists almost 
everywhere. A slight generalization of the proof of Theorem A now yields 


i} “F'(x) dx = F(c) — F(a) = f “f (x) dx, 
SO 


f "LF(x)—f(x)] dx = 0 


for all c€[a, b]. By an elementary property of the Lebesgue integral ([7], 
p. 87, Lemma 7), it therefore follows that F’(x) = f(x) except possibly on a 
set of measure zero. ‘ma 


Theorems A and B together say that differentiation and Lebesgue 
integration are inverse operations on very large classes of functions. If f is 
a bounded measurable function on [a, b], then 


SIO at = $6) (1) 
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except possibly on a set of measure zero. If f is a bounded measurable 
function with f(a) =0, and its derivative f’ exists and is bounded on [a, 5}, 
then 


i “f'(t) dt = f(x). (2) 


These two versions of the fundamental theorem of calculus provide a 
definitive rigorous formulation of the inverse relationship between dif- 
ferentiation and integration that Newton and Leibniz discovered on con- 
ceptual grounds and exploited in the seventeenth century. 


Non-standard Analysis—The Vindication of Euler? 


For most of the past century, since Weierstrass, students of the calculus 
have been assiduously taught that infinitesimals do not exist, and must not 
be mentioned in formal mathematical discourse. But in 1960 Abraham 
Robinson proved that infinitesimals do exist as genuine mathematical 
objects, and can serve as the basis for an alternative rigorous development 
of the calculus! His 1966 book Non-standard Analysis [6] showed how to 
develop much of modern analysis in terms of infinitesimals, and in 1976 a 
“non-standard” introductory calculus textbook [2] by H. J. Keisler ap- 
peared. Keisler’s monograph Foundations of Infinitesimal Calculus [3] is the 
best introduction to infinitesimals and non-standard calculus at an inter- 
mediate level. 

Recall that the ordered field R of real numbers is complete—it satisfies 
the least upper bound axiom. In fact, every complete ordered field is 
isomorphic to the field R of real numbers. Non-standard analysis is based 
on the fact that there exists a (non-complete) field R* of “hyperreal” 
numbers that contains R as a proper subfield (Axiom | below), such that 
every function f(x,, X2,..., X,) Of m real variables has a natural extension 
f* which is a function of n “hyperreal” variables (Axiom 2), and such that 
if two systems of formulas have the same real solutions then they have the 
same “hyperreal” solutions (Axiom 3). Axioms 1-3 are established in 
Section 1E of [3]. 


Axiom 1. There exists a proper ordered field extension R* of the field 
R of real numbers. 


The elements of R* are called hyperreal numbers. An element x € R* is 
called infinitesimal if |x|<,r for every positive real number r; finite if |x| <r 
for some positive real number 7; infinite if |x|>r for all real numbers r. 
Both infinitesimal and infinite hyperreal numbers actually exist ({3], p. 7, 
Theorem 8). Sums, differences, and products of infinitesimals are infinitesi- 
mal, as is the reciprocal of an infinite number, whereas the product of an 
infinitesimal and a finite number is infinitesimal ((3], p. 4, Theorem 3). 
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Two elements x, y& R* are said to be infinitely close, written xy, 
provided their difference x — y is infinitesimal. According to the “standard 
part theorem” ({3], p. 5) every finite hyperreal number x is infinitely close 
to a unique real number r. This unique r~.x is called the standard part of 
x, written r=st(x). If x and »y are finite then xwy if and only if 
st(x) = st(y), and 


(1) st(x + y) =st(x) + st(y), 

(ii) st(xy) =st(x) st(), 
(iii) st(x/y)=st(x)/st(y) if st(y) 40, ; 
(iv) st(Wx )= Vst(x) . 


Axiom 2 (Function Axiom). Let f be a real-valued function defined 
on some subset of the set R” of all n-tuples of real numbers. Then to 
f there corresponds a hyperreal-valued function f* on 7 hyperreal 
variables, called the natural extension of f. The field operations of 
R* are the natural extensions of the field operations of R. 


The domain definition of f* is the natural extension of the domain of 
definition of f. Given a set X C R, the natural extension of X is defined as 
follows. Consider a (finite) system F of formulas that has X as its set of 
real solutions (intuitively, a formula is simply an equality or inequality of 
functions—see ([3], p. 10) for the precise definition)). If F* is the system of 
natural extensions of the formulas in F, then the natural extension of X is 
the set of hyperreal solutions of the system F*. By Axiom 3 below, X* is 
independent of the particular choice F of a system of formulas having the 
set X as its set of solutions. Natural extension of sets preserves the usual 
set operations, 


(XU Y)* = X*U Y*, (XT Y)* = X* 7. ¥*, 
X CY ifandonlyif X* c Y*. 


If X is a bounded set of real numbers, then X* consists of finite hyperreal 
numbers. If f is a real-valued function of one real variable, then 


(domain f)* = domain(f*) and (range f)* = range(/*). 


Axiom 3 (Solution Axiom). If two systems of formulas have exactly 
the same real solutions, then their natural extensions have exactly 
the same hyperreal solutions. 


For example, let f and g be real-valued functions defined on the set 
DCR, with f(x) < g(x) for all x € D. Then the two formulas f(x) = g(x) 
and f(x) < g(x) have the same set of real solutions, namely D. Hence the 
two formulas f*(x)=g*(x) and f*(x) < g*(x) have the same set D* of 
hyperreal solutions. Thus it follows from the solution axiom that f(x) < 
g(x) for all x € D implies f*(x) < g*(x) for all x € D*. 
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When no confusion will result, we write f in place of f*. The function f 
of a real variable 1s differentiable at aE R provided that the quotient 


f(a +Ax) — f(a) 
Ax 


is finite and has the same standard part for every non-zero infinitesimal 
Ax 0. Its derivative at a is then 


f(a) = si( Ket 2 HO) (8) 


Of course this definition turns out to be equivalent to the usual one in 
terms of limits of real numbers. But (8) can be applied directly to calculate 
derivatives by taking standard parts instead of using limits. For example, 


| er | x+Ax-—x 
dx Ax Ax(Vx+Ax + Vx ) 


l 1 
= §{ -——_——_—___—— = ——______________— 
Vx+Ax+Vx  st(Vx+Ax +Vx ) 


] ] 


gee VV 2Vx 
If we write y = f(x), Ay = f(x + Ax) — f(x), then 
Ay = f'(x)Ax + «Ax 


where ¢€ = (Ay /Ax) — f(x) is an infinitesimal if Ax is, so Ay~0 also. The 
non-standard derivation of the product rule for y = uv is 


Ay _ (u+Au)(o+Av)—uv _ Ao Au Ao 


Ax Ax Ax a O Ax ~ sad 
Oe (x2) (<*) (x2) 
Ae ee iG + v st AG + Ost re 
age: du 
dx” dx 


because Au-~0. To derive the chain rule for y = g(x), x = f(0), first write 
Ay = g’(x)Ax + eAx 


where e~0. Then 


Be PG a 
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so, taking standard parts, we obtain 


o = g’(x) si( = +0 t( =) = 8'(f())f'(). 


Thus the non-standard proof is simpler than the standard one. 
To see that f’(a)=0 at a local maximum, let Ax be a positive infinites- 
imal. Then 


f(at+Ax) < f(a), f(a—Ax) < f(a) 
SO 
f(a + Ax) ~ f(a) f(a—Ax) — f(a) 
Ay <0< AS 


Taking standard parts then gives f’(a) <0 < f’(a). 

We say the real function f is continuous at aE R if f(x)=f(a) for all 
xa, that is, f(x) 1s infinitely close to f(a) when x is infinitely close to a 
(just as Cauchy said it!). The function f is uniformly continuous on the set 
SCR if f(x)~fOQ) for all x, y € S* such that x~y. The non-standard 
proof, that fis uniformly continuous on the closed and bounded set S if it 
is continuous at each point of S, is almost trivial (in contrast with the 
standard proof). Consider x,y @ S* with xy. Then x and y are finite 
because S is bounded. From the fact that S is closed it follows easily that 
a = St(x) =st(y) is a point of S. Then f(x)~f(a) and f(y)~/f (a) because f 
is continuous at a, so f(x)~f(y) as desired. Of course it turns out that the 
above definitions of continuity and uniform continuity are equivalent to 
the standard ones ({3], Chapter 5). 

The non-standard definition of the integral requires the notion of a 
hyperinteger. The set of hyperintegers is the natural extension Z* of the set 
Z of (real) integers. A real number is an integer if and only if it is a 
hyperinteger, and every finite hyperinteger is an integer. 

Given a bounded function f on the (real) interval [a,b], let Ax 
= (b — a)/n where n is a (real) positive integer. The usual lower and upper 
Riemann sums for f, 


L(n) = >» m, Ax and U(n) = > M, Ax, 
i=] 
corresponding to the partition of [a, b] into nm equal subintervals each 
having length Ax, may be regarded as functions defined on the set Z, of 
positive integers. Their natural extensions L* and U* are defined on the 
set Z* of positive hyperintegers, and are finite-valued because f(x) is 
bounded on [a, b]. Then f is Riemann integrable on [a, b] if, for any infinite 
positive hyperinteger N, the values L*(N) and U*(N) are infinitely close, 
L*(N)~ U*(N), and their common standard part 


st(L*(N)) = st(U*(N)) 
is independent of the infinite hyperinteger N. 
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A suggestive notation for L*(N) and U*(N) is 
b b 
L*(N) = > m,dx and U*(N) = >) M, dx 


where dx =(b—a)/N=0. These are called the infinite lower and upper 
Riemann sums of f with respect to the infinitesimal dx; each may be 
visualized as the sum of the areas of infinitely many vertical strips having 
infinitesimal width dx. The value of the integral is then 


f7@ dx = s( 5 my ax| = «(> M, ax| 


There is a non-standard proof that the function f is integrable if it is 
continuous ({3], p. 96, Lemma 1). 

The following proof of the fundamental theorem of calculus, in the form 
that says that 


f ° f(x) dx = F(b) — F(a) 


if F’=f and f is integrable on [a, 5], is the non-standard version of a 
standard proof that was given by Darboux in 1875 (see the references to 
Chapter 11). Let Ax =(b—a)/n where n is a (real) positive integer, and 
consider the partition of [a, b] into n equal subintervals. Application of the 
mean value theorem to F on the ith subinterval [x,_,, x;] gives 


F(x,) — F(x,_1) = F’(x,) Ax = f(x;) Mx 
for some x; &[x;_,, x,]. Hence 
m, Ax < F(x;) ~— F(x;_,) < M, Ax. 
Summation of these inequalities for i=1,2,...,n yields 
am Ax < 2 [ F(x) — F(x,_)] < > M;, Ax, 
i=l i=] i=l 
L(n) < F(b) — F(a) < U(n) 


for every nE Z,. It then follows from the solution axiom (see the remark 
following it above) that 


L*(N) < F(b) — F(a) < U*(N) 


for every N € Z* . Taking standard parts with N an infinite hyperinteger, 
we therefore obtain 


f f(x) dx < F(b) — F(a) < f ”§(x) dx, 


because the hypothesis that f is integrable means that st(L*(N))= 
st(U*(N)) = J f(x) dx. 
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The title of this final section of our long account of the history of the 
calculus should not be taken too literally. It is true, as the above discussion 
suggests, that non-standard analysis can be employed to convert most of 
the intuitive infinitesimal arguments of the seventeenth and eighteenth 
centuries into logically precise arguments. But this is an a posteriori 
interpretation in terms of twentieth century mathematical thought rather 
than a “vindication” of the seventeenth and eighteenth centuries on their 
own terms. My own view is that non-standard analysis is a significant 
development of contemporary mathematics with more implications for the 
future than for the past. Nevertheless, the non-standard analysis of our 
time has given clearcut answers to some of the most venerable questions of 
our subject, and it cannot be denied that this success provides both a 
palpable satisfaction and a perfect ending for a book on the historical 
development of the calculus. 
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